Elementary 
Linear Algebra 



HOWARD ANTON / CHRIS RORRES 









About The Author 


Howard Anton obtained his B.A. from Lehigh University, his M.A. from the University of Illinois, and his 
Ph.D. from the Polytechnic University of Brooklyn, all in mathematics. In the early 1960s he worked for 
Burroughs Corporation and Avco Corporation at Cape Canaveral, Florida, where he was involved with the 
manned space program. In 1968 he joined the Mathematics Department at Drexel University, where he taught 
full time until 1983. Since then he has devoted the majority of his time to textbook writing and activities for 
mathematical associations. Dr. Anton was president of the EPADEL Section of the Mathematical Association 
of America (MAA), served on the Board of Governors of that organization, and guided the creation of the 
Student Chapters of the MAA. In addition to various pedagogical articles, he has published numerous 
research papers in functional analysis, approximation theory, and topology. He is best known for his textbooks 
in mathematics, which are among the most widely used in the world. There are currently more than 150 
versions of his books, including translations into Spanish, Arabic, Portuguese, Italian, Indonesian, French, 
Japanese, Chinese, Hebrew, and German. For relaxation, Dr. Anton enjoys travel and photography. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Preface 

This edition of Elementary Linear Algebra gives an introductory treatment of linear algebra that is suitable for 
a first undergraduate course. Its aim is to present the fundamentals of linear algebra in the clearest possible 
way—sound pedagogy is the main consideration. Although calculus is not a prerequisite, there is some 
optional material that is clearly marked for students with a calculus background. If desired, that material can 
be omitted without loss of continuity. 

Technology is not required to use this text, but for instructors who would like to use MAT LAB, Mathematica , 
Maple, or calculators with linear algebra capabilities, we have posted some supporting material that can be 
accessed at either of the following Web sites: 

www.howardanton.com 

www.wilev.com/college/anton 


Summary of Changes in this Edition 

This edition is a major revision of its predecessor. In addition to including some new material, some of the old 
material has been streamlined to ensure that the major topics can all be covered in a standard course. These 
are the most significant changes: 

Vectors in 2-space, 3-space, and n-space Chapters 3 and 4 of the previous edition have been combined 
into a single chapter. This has enabled us to eliminate some duplicate exposition and to juxtapose concepts 
in «-space with those in 2-space and 3-space, thereby conveying more clearly how n-space ideas generalize 
those already familiar to the student. 

New Pedagogical Elements Each section now ends with a Concept Review and a Skills mastery that 
provide the student a convenient reference to the main ideas in that section. 

New Exercises Many new exercises have been added, including a set of True/False exercises at the end of 
most sections. 

Earlier Coverage of Eigenvalues and Eigenvectors The chapter on eigenvalues and eigenvectors, which 
was Chapter 7 in the previous edition, is Chapter 5 in this edition. 

Complex Vector Spaces The chapter entitled Complex Vector Spaces in the previous edition has been 
completely revised. The most important ideas are now covered in Section 5.3 and Section 7.5 in the context 
of matrix diagonalization. A brief review of complex numbers is included in the Appendix. 

Quadratic Forms This material has been extensively rewritten to focus more precisely on the most 
important ideas. 

New Chapter on Numerical Methods In the previous edition an assortment of topics appeared in the last 
chapter. That chapter has been replaced by a new chapter that focuses exclusively on numerical methods of 
linear algebra. We achieved this by moving those topics not concerned with numerical methods elsewhere 
in the text. 

Singular-Value Decomposition In recognition of its growing importance, a new section on Singular-Value 
Decomposition has been added to the chapter on numerical methods. 




Internet Search and the Power Method A new section on the Power Method and its application to 
Internet search engines has been added to the chapter on numerical methods. 

Applications There is an expanded version of this text by Howard Anton and Chris Rorres entitled 

Elementary Linear Algebra: Applications Version , 10 th (ISBN 9780470432051), whose purpose is to 
supplement this version with an extensive body of applications. However, to accommodate instructors who 
asked us to include some applications in this version of the text, we have done so. These are generally less 
detailed than those appearing in the Anton/Rorres text and can be omitted without loss of continuity. 


Hallmark Features 


Relationships Among Concepts One of our main pedagogical goals is to convey to the student that linear 
algebra is a cohesive subject and not simply a collection of isolated definitions and techniques. One way in 
which we do this is by using a crescendo of Equivalent Statements theorems that continually revisit 
relationships among systems of equations, matrices, determinants, vectors, linear transformations, and 
eigenvalues. To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 4.8.10, 
4.10.4 and then Theorem 5.1.6, for example. 

Smooth Transition to Abstraction Because the transition from R n to general vector spaces is difficult for 
many students, considerable effort is devoted to explaining the purpose of abstraction and helping the 
student to “visualize” abstract ideas by drawing analogies to familiar geometric ideas. 

Mathematical Precision When reasonable, we try to be mathematically precise. In keeping with the level 
of student audience, proofs are presented in a patient style that is tailored for beginners. There is a brief 
section in the Appendix on how to read proof statements, and there are various exercises in which students 
are guided through the steps of a proof and asked for justification. 

Suitability for a Diverse Audience This text is designed to serve the needs of students in engineering, 
computer science, biology, physics, business, and economics as well as those majoring in mathematics. 

Historical Notes To give the students a sense of mathematical history and to convey that real people 
created the mathematical theorems and equations they are studying, we have included numerous Historical 
Notes that put the topic being studied in historical perspective. 


About the Exercises 


Graded Exercise Sets Each exercise set begins with routine drill problems and progresses to problems 
with more substance. 

True/False Exercises Most exercise sets end with a set of True/False exercises that are designed to check 
conceptual understanding and logical reasoning. To avoid pure guessing, the students are required to justify 
their responses in some way. 

Supplementary Exercise Sets Most chapters end with a set of supplementary exercises that tend to be 
more challenging and force the student to draw on ideas from the entire chapter rather than a specific 
section. 


Supplementary Materials for Students 

Student Solutions Manual This supplement provides detailed solutions to most theoretical exercises and 
to at least one nonroutine exercise of every type (ISBN 9780470458228). 

Technology Exercises and Data Files The technology exercises that appeared in the previous edition have 
been moved to the Web site that accompanies this text. Those exercises are designed to be solved using 
MATLAB, Mathematica, or Maple and are accompanied by data files in all three formats. The exercises and 
data can be downloaded from either of the following Web sites. 

www.howardanton.com 

www.wilev.com/college/anton 


Supplementary Materials for Instructors 

Instructor's Solutions Manual This supplement provides worked-out solutions to most exercises in the 
text (ISBN 9780470458235). 

WileyPLUS™ This is Wiley's proprietary online teaching and learning environment that integrates a 
digital version of this textbook with instructor and student resources to fit a variety of teaching and learning 
styles. WileyPLUS will help your students master concepts in a rich and structured environment that is 
available to them 24/7. It will also help you to personalize and manage your course more effectively with 
student assessments, assignments, grade tracking, and other useful tools. 

Your students will receive timely access to resources that address their individual needs and will 
receive immediate feedback and remediation resources when needed. 

There are also self-assessment tools that are linked to the relevant portions of the text that will enable 
your students to take control of their own learning and practice. 

WileyPLUS will help you to identify those students who are falling behind and to intervene in a 
timely manner without waiting for scheduled office hours. 

More information about WileyPLUS can be obtained from your Wiley representative. 


A Guide for the Instructor 

Although linear algebra courses vary widely in content and philosophy, most courses fall into two categories 
—those with about 35-40 lectures and those with about 25-30 lectures. Accordingly, we have created long 
and short templates as possible starting points for constructing a course outline. Of course, these are just 
guides, and you will certainly want to customize them to fit your local interests and requirements. Neither of 
these sample templates includes applications. Those can be added, if desired, as time permits. 



Long Template 

Short Template 

Chapter 1: Systems of Linear Equations and Matrices 

Chapter 2: Determinants 

7 lectures 

3 lectures 

6 lectures 

2 lectures 








Long Template Short Template 


Chapter 3: Euclidean Vector Spaces 

4 lectures 

3 lectures 

Chapter 4: General Vector Spaces 

10 lectures 

10 lectures 

Chapter 5: Eigenvalues and Eigenvectors 

3 lectures 

3 lectures 

Chapter 6: Inner Product Spaces 

3 lectures 

1 lecture 

Chapter 7: Diagonalization and Quadratic Forms 

4 lectures 

3 lectures 

Chapter 8: Linear Transformations 

3 lectures 

2 lectures 

Total: 

37 lectures 

30 lectures 
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| CHAPTER | 

1 Systems of Linear 

Equations and Matrices 


CHAPTER CONTENTS 

Introduction to Systems of Linear Equations 
Gaussian Elimination 
Matrices and Matrix Operations 
Inverses; Algebraic Properties of Matrices 
Elementary Matrices and a Method for Finding 
More on Linear Systems and Invertible Matrices 
Diagonal, Triangular, and Symmetric Matrices 
Applications of Linear Systems 

Network Analysis (Traffic Flow) 

Electrical Circuits 
Balancing Chemical Equations 
Polynomial Interpolation 

Leontief Input-Output Models 


INTRODUCTION 

Information in science, business, and mathematics is often organized into rows and 
columns to form rectangular arrays called “matrices” (plural of “matrix”). Matrices often 
appear as tables of numerical data that arise from physical observations, but they occur in 
various mathematical contexts as well. For example, we will see in this chapter that all of 
the information required to solve a system of equations such as 

5x+y = 3 
2x-y = 4 


is embodied in the matrix 


5 1 3 

_2 -1 4 _ 

and that the solution of the system can be obtained by performing appropriate operations 
on this matrix. This is particularly important in developing computer programs for solving 
systems of equations because computers are well suited for manipulating arrays of 
numerical information. However, matrices are not simply a notational tool for solving 
systems of equations; they can be viewed as mathematical objects in their own right, and 
there is a rich and important theory associated with them that has a multitude of practical 
applications. It is the study of matrices and related topics that forms the mathematical field 
that we call “linear algebra.” In this chapter we will begin our study of matrices. 
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1.1 Introduction to Systems of Linear Equations 

Systems of linear equations and their solutions constitute one of the major topics that we will study in this 
course. In this first section we will introduce some basic terminology and discuss a method for solving such 
systems. 


Linear Equations 

Recall that in two dimensions a line in a rectangular xy-coordinate system can be represented by an equation of 
the form 

ax +by =c (a, b not both 0) 

and in three dimensions a plane in a rectangular xyz-coordinate system can be represented by an equation of the 
form 

ax + by + cz = d (<s, b, c not all 0) 

These are examples of “linear equations,” the first being a linear equation in the variables x andy and the second 
a linear equation in the variables x, y, and z. More generally, we define a linear equation in the n variables 
x \, X2, - - x n to be one that can be expressed in the form 

a\x\ +^ 2*2 + — b ( 1 ) 

where a\, ^ 2 , a n and b are constants, and the a's are not all zero. In the special cases where n = 2 or ^ = 3, 
we will often use variables without subscripts and write linear equations as 

a\x ^ a^y = b (a\, not both 0) (2) 

a\x+ azy+ ajz = b ( a \, a 2 , ^3 not all 0 ) ( 3 ) 

In the special case where £ = Q, Equation 1 has the form 

<*1*1 +^2*2 + + = 0 (4) 

which is called a homogeneous linear equation in the variables x\, X 2 , - - x n . 

EXAMPLE 1 Linear Equations 

Observe that a linear equation does not involve any products or roots of variables. All variables 
occur only to the first power and do not appear, for example, as arguments of trigonometric, 
logarithmic, or exponential functions. The following are linear equations: 

x + 3y = 7 x 1 — 2x2 “ 3x3 + *4 = 0 

^-x — y + 3z = — 1 xi+X 2 +... + x„ = 1 
The following are not linear equations: 


x + 3y 2 = 4 3x 4- 2y - xy = 5 
sin x +7 = 0 {x\‘¥ 2 x 2 -¥x'i = ^ 


A finite set of linear equations is called a system of linear equations or, more briefly, a linear system. The 
variables are called unknowns . For example, system 5 that follows has unknowns x andy, and system 6 has 
unknowns x l ,x 2 , and x 3 . 


5x+.y = 3 4xi — *2 + 3*3 = - 1 (5) 

2x-y = 4 3xi +*2 + 9*3= “4 (6) 


The double subscripting on the coefficients a ij 
of the unknowns gives their location in the 
system—the first subscript indicates the equation 
in which the coefficient occurs, and the second 
indicates which unknown it multplies. Thus, a 12 
is in the first equation and multiplies x 2 . 


A general linear system of m equations in the n unknowns x\, x 2 ,_, x^ can be written as 

fllixi +.312X2 + .~ + 01 mX„ = b\ 

<* 21*1 +a 2 2X2 + ... + a 2f7 x yi = i>2 (7) 

a ml x l +<3m2 ;, ‘2 + --- + <*mn x n = ^m 

A solution of a linear system in n unknowns x\, * 2 , is a sequence of n numbers s\, s 2 , x n for which 

the substitution 

xi =s U x 2 =s 2 ,..., x„ = s„ 

makes each equation a true statement. For example, the system in 5 has the solution 

*= 1, 7 = “2 

and the system in 6 has the solution 

XI = 1, *2 = 2, X3= — 1 
These solutions can be written more succinctly as 

(1, -2) and (1,2, - 1) 

in which the names of the variables are omitted. This notation allows us to interpret these solutions geometrically 
as points in two-dimensional and three-dimensional space. More generally, a solution 

xi =s\, x 2 = s 2 ,..., x„ = s„ 
of a linear system in n unknowns can be written as 

( s l> s 2>---> s n) 

which is called an ordered n-tuple. With this notation it is understood that all variables appear in the same order 



in each equation. If ^ = 2, then the ?z-tuple is called an ordered pair , and if n = 3, then it is called an ordered 
triple. 


Linear Systems with Two and Three Unknowns 


Linear systems in two unknowns arise in connection with intersections of lines. For example, consider the linear 
system 

ct\x + b\y = c\ 
a 2 x + b2y = C2 

in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this system corresponds to a 
point of intersection of the lines, so there are three possibilities (Figure 1.1.1): 

The lines may be parallel and distinct, in which case there is no intersection and consequently no solution. 
The lines may intersect at only one point, in which case the system has exactly one solution. 

The lines may coincide, in which case there are infinitely many points of intersection (the points on the 
common line) and consequently infinitely many solutions. 




k? 


X 

-► 


No solution 


One solution 


Infinitely many 
solutions 
(coincident lines) 


Figure 1.1.1 

In general, we say that a linear system is consistent if it has at least one solution and inconsistent if it has no 
solutions. Thus, a consistent linear system of two equations in two unknowns has either one solution or infinitely 
many solutions—there are no other possibilities. The same is true for a linear system of three equations in three 
unknowns 

a\x + b\y +c\z = d\ 

<*2* + biy +C2Z = d2 
aix + b-yy + cjz = < 3 ?3 

in which the graphs of the equations are planes. The solutions of the system, if any, correspond to points where 
all three planes intersect, so again we see that there are only three possibilities—no solutions, one solution, or 
infinitely many solutions (Figure 1.1.2). 












1- 

No solutions 


No solutions 

(three parallel planes; 


(two parallel planes; 

no common intersection) 


no common intersection) 




No solutions 
(two coincident planes 
parallel to the third; 
no common intersection) 



One solution 
(intersection is a point) 




Infinitely many solutions 
(planes are all coincident; 
intersection is a plane) 



Figure 1.1.2 


We will prove later that our observations about the number of solutions of linear systems of two equations in two 
unknowns and linear systems of three equations in three unknowns actually hold for all linear systems. That is: 


Every system of linear equations has zero, one, or infinitely many solutions. There are no other 
possibilities. 


EXAMPLE 2 A Linear System with One Solution 


Solve the linear system 


x—y = 1 
2x + y = 6 


We can eliminate x from the second equation by adding -2 times the first equation to 
the second. This yields the simplified system 

x-y = 1 
3y = 4 

From the second equation we obtain y = ~, and on substituting this value in the first equation we 
n 

obtain x = 1 +y = —. Thus, the system has the unique solution 










































X = 


7 

y 



Geometrically, this means that the lines represented by the equations in the system intersect at the 
single point | y J. We leave it for you to check this by graphing the lines. 


EXAMPLE 3 A Linear System with No Solutions 


Solve the linear system 


x+y = 4 
3x + 3y = 6 


We can eliminate x from the second equation by adding -3 times the first equation to 
the second equation. This yields the simplified system 

*+7 = 4 
0 = -6 

The second equation is contradictory, so the given system has no solution. Geometrically, this 
means that the lines corresponding to the equations in the original system are parallel and distinct. 
We leave it for you to check this by graphing the lines or by showing that they have the same slope 
but different y-intercepts. 


EXAMPLE 4 A Linear System with Infinitely Many Solutions 

Solve the linear system 

4x - 2y = 1 

16x * 8y = 4 


In Example 4 we could have also obtained 
parametric equations for the solutions by 
solving 8 for y in terms of x, and letting 
x = t be the parameter. The resulting 
parametric equations would look different 
but would define the same solution set. 


We can eliminate x from the second equation by adding -4 times the first equation to 
the second. This yields the simplified system 

4x — 2y = 1 
0 = 0 

The second equation does not impose any restrictions on x andy and hence can be omitted. Thus, 
the solutions of the system are those values of x and y that satisfy the single equation 


4x — 2y = \ 


( 8 ) 


Geometrically, this means the lines corresponding to the two equations in the original system 
coincide. One way to describe the solution set is to solve this equation for x in terms of y to obtain 
x = ~ + -J- y and then assign an arbitrary value t (called a parameter) to y. This allows us to 

express the solution by the pair of equations (called parametric equations) 

* = 4 + 2‘- y=t 


We can obtain specific numerical solutions from these equations by substituting numerical values 
for the parameter. For example, t = Q yields the solution oj, t = 1 yields the solution \ ^ f 1J, 

and t = — 1 yields the solution — 1 j. You can confirm that these are solutions by 

substituting the coordinates into the given equations. 


EXAMPLE 5 A Linear System with Infinitely Many Solutions 

Solve the linear system 

x —y + 2z = 5 

2x — 2y + 4z = 10 

3x — 3y + 6z = 15 

This system can be solved by inspection, since the second and third equations are 
multiples of the first. Geometrically, this means that the three planes coincide and that those values 
of x, y, and z that satisfy the equation 


x-y + 2z = 5 (9) 

automatically satisfy all three equations. Thus, it suffices to find the solutions of 9. We can do this 
by first solving 9 for x in terms of y and z, then assigning arbitrary values r and s (parameters) to 
these two variables, and then expressing the solution by the three parametric equations 

x = 5 + r-2s, y = r, z = s 

Specific solutions can be obtained by choosing numerical values for the parameters r and 5 . For 
example, taking r = \ and s = 0 yields the solution (6, 1,0). 


Augmented Matrices and Elementary Row Operations 

As the number of equations and unknowns in a linear system increases, so does the complexity of the algebra 
involved in finding solutions. The required computations can be made more manageable by simplifying notation 
and standardizing procedures. For example, by mentally keeping track of the location of the + ? s, the x f s, and the 
- s in the linear system 


a ii*i 

+ 

*12*2 

+ • ' 

. . q= 

a ln x n 

= h 

*21*1 

+ 

<*22*2 

+ ' ' 

. . q= 

a 2 Yl X Yl 

= h 

«ml*l 

+ 

*m2*2 

+ ' ' 

. . q= 


= b m 


we can abbreviate the system by writing only the rectangular array of numbers 


a n 

a 12 • ' 

a \ n 

*i 

*21 

<222 ’ ' 


h 

a ml 

a m2 

a mn 

bm 


As noted in the introduction to this chapter, the 
term “matrix” is used in mathematics to denote a 
rectangular array of numbers. In a later section 
we will study matrices in detail, but for now we 
will only be concerned with augmented matrices 
for linear systems. 

This is called the augmented matrix for the system. For example, 
equations 

*1 +*2 + 2^3 = 9 [l 
2 xi+4x2 “ 3x3 = 1 1S 2 
3xi + 6 x 2 “ 5x3 = 0 


the augmented matrix for the system of 

1 2 9 “ 

4 “3 1 
6—5 0 


The basic method for solving a linear system is to perform appropriate algebraic operations on the system that do 
not alter the solution set and that produce a succession of increasingly simpler systems, until a point is reached 
where it can be ascertained whether the system is consistent, and if so, what its solutions are. Typically, the 
algebraic operations are as follows: 

Multiply an equation through by a nonzero constant. 

Interchange two equations. 

Add a constant times one equation to another. 

Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, 
these three operations correspond to the following operations on the rows of the augmented matrix: 

Multiply a row through by a nonzero constant. 

Interchange two rows. 

Add a constant times one row to another. 

These are called elementary row operations on a matrix. 

In the following example we will illustrate how to use elementary row operations and an augmented matrix to 
solve a linear system in three unknowns. Since a systematic procedure for solving linear systems will be 
developed in the next section, do not worry about how the steps in the example were chosen. Your objective here 
should be simply to understand the computations. 






EXAMPLE 6 Using Elementary Row Operations 


In the left column we solve a system of linear equations by operating on the equations in the 
system, and in the right column we solve the same system by operating on the rows of the 
augmented matrix. 


x±y + 2z = 9 

2x + Ay - 3z = 1 
3x + 6y — 5z = 0 


11 2 9 

2 4-31 

3 6-50 


Add -2 times the first equation to the second 
to obtain 

x+y + 2z = 9 

2y — Iz = -17 

3x + 6y — 5z — 0 


Add -3 times the first equation to the third to 
obtain 

x+y + 2z = 9 

2y-lz = -17 

3y-Uz = -27 


Multiply the second equation by ~ to obtain 


x +y + 2 z = 9 



3y-Uz = -27 


Add -3 times the second equation to the third 
to obtain 

x+y + 2z = 9 



Multiply the third equation by -2 to obtain 
x 4 -y + 2z = 9 



z = 3 


Add -2 times the first row to the second 
to obtain 

'112 9' 

0 2-7 -17 
3 6-5 0 


Add -3 times the first row to the third to 
obtain 

fl 1 2 9' 

0 2-7 -17 
0 3 -11 -27 


Multiply the second row by to obtain 


1 1 2 

0 1 -l 

0 3 -11 


9 

" 2 

-27 


Add -3 times the second row to the third 
to obtain 


1 1 
0 1 

00 —^ 


9 

XL 

2 

3 

'2 


Multiply the third row by -2 to obtain 
"112 9' 

0 1 -I -f 

0 0 1 3 


Add -1 times the second equation to the first Add -1 times the second row to the first 
to obtain to obtain 














+ 1I Z = 25 

+ 2 2 

y-lz = -il 

y 2 2 


z = 


oi-?-4?- 


0 0 


Add times the third equation to the first 


11 


Add — 77 - times the third row to the first 

2 


and times the third equation to the second to and times the third row to the second 


2 

obtain 


y = 

z = 


1 

2 

3 


to obtain 

1 0 0 
0 1 0 
0 0 1 


The solution x = 1, y = 2, z= 3 is now evident. 



Maxime Bocher (1867-1918) 

The first known use of augmented matrices appeared between 200 B.C. 
and 100 B.C. in a Chinese manuscript entitled Arne Chapters of Mathematical Art. The 
coefficients were arranged in columns rather than in rows, as today, but remarkably the 
system was solved by performing a succession of operations on the columns. The actual 
use of the term augmented matrix appears to have been introduced by the American 
mathematician Maxime Bocher in his book Introduction to Higher Algebra, published in 
1907. In addition to being an outstanding research mathematician and an expert in Latin, 
chemistry, philosophy, zoology, geography, meteorology, art, and music, Bocher was an 
outstanding expositor of mathematics whose elementary textbooks were greatly 
appreciated by students and are still in demand today. 

[Image: Courtesy of the American Mathematical Society] 






Concept Review 

Linear equation 
Homogeneous linear equation 
System of linear equations 
Solution of a linear system 
Ordered ^z-tuple 
Consistent linear system 
Inconsistent linear system 
Parameter 

Parametric equations 
Augmented matrix 


Elemenetary row operations 

Skills 


Determine whether a given equation is linear. 

Determine whether a given ^z-tuple is a solution of a linear system. 

Find the augmented matrix of a linear system. 

Find the linear system corresponding to a given augmented matrix. 

Perform elementary row operations on a linear system and on its corresponding augmented matrix. 
Determine whether a linear system is consistent or inconsistent. 

Find the set of solutions to a consistent linear system. 


Exercise Set 1.1 


1. In each part, determine whether the equation is linear in x i, and * 3 . 



(b) XI + 3*2+ *1*3 = 2 

(c) *1 = -7 x 2 + 3x 3 

(d) xf 2 + X 2 + 8 x 3 = 5 

(e) xj /5 - 2x2 + X3 = 4 



Answer: 

(a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 
2. In each part, determine whether the equations form a linear system. 




(a) —2x + 4y +z = 2 



(b) x = 4 
2x = 8 

(c) 4x — y + 2z= — 1 
—x + (In 2)y — 3z= 0 

(d) 3z + x = — 4 

y + 5z= 1 
6x + 2z = 3 
-x- 7 -z = 4 

3. In each part, determine whether the equations form a linear system. 

(a) 2xi - x 4 = 5 

— x i + 5x2 + 3*3 — 2*4 = — 1 

(b) sin( 2 ^i +* 3 ) = {5 

e 2x 2 ~ 2*4 _ X 
x 2 

4x a = 4 

(c) - X 2 + 2 x 2 = 0 

2 xi + x 2 —X2 X 4 = 3 

— xi+ 5x2— X 4 = — 1 

(d) + x 2 = x 2 + x 4 

Answer: 

(a) and (d) are linear systems; (b) and (c) are not linear systems 

4. For each system in Exercise 2 that is linear, determine whether it is consistent. 

5. For each system in Exercise 3 that is linear, determine whether it is consistent. 

Answer: 

(a) and (d) are both consistent 

6 . Write a system of linear equations consisting of three equations in three unknowns with 

(a) no solutions. 

(b) exactly one solution. 

(c) infinitely many solutions. 

7. In each part, determine whether the given vector is a solution of the linear system 

2xi — 4x2 — *3 = 1 
xi — 3x2 + *3=1 
3xi — 5x2 — 3x3 = 1 


(a) (3, 1, 1) 

(b) (3,-1,1) 



(c) (13,5,2) 

(e) (17,7,5) 

Answer: 

(a), (d), and (e) are solutions; (b) and (c) are not solutions 

8 . In each part, determine whether the given vector is a solution of the linear system 

*1 + 2*2 — 2*3 = 3 

3*i -X2 + X 3 = 1 

—xi + 5x2 — 5x3 = 5 

w (§•!■’) 

<b) (f §,o) 

(c) (5, 8 , 1) 

(d) (1 10 2) 

\T 7 ’ 7 J 

9. In each part, find the solution set of the linear equation by using parameters as necessary. 

(a) 7x — 5y = 3 

(b) -8xi + 2x2 - 5*3 + 6x4 = 1 

Answer: 


« x = !<+f 

X 2 = r 
X 2 = s 
X 4 = i 

10. In each part, find the solution set of the linear equation by using parameters as necessary. 

(a) 3xi -5x 2 + 4x 3 = 7 

(b) 3v — 8 w + 2x —y + Az = 0 

11. In each part, find a system of linear equations corresponding to the given augmented matrix 


(a) 


2 0 0 
3-4 0 
0 1 1 



( b ) 

'3 


0 

-2 

5 


7 


1 

4 

-3 


0 

- 

2 

1 

7 

( c ) 

'7 

2 

1 

-3 

5 ' 


1 

2 

4 

0 

1 _ 

( d ) 

1 

0 

0 

0 

7 " 


0 

1 

0 

0 - 

-2 


0 

0 

1 

0 

3 


0 

0 

0 

1 

4 


Answer: 


(a) 2xi = 0 

3*1 — 4^2 = 0 




*2 

= 

1 


( b ) 

3xi 



— 

2x3 


7xi 

+ 

*2 

+ 

4x3 




-2x2 

+ 

*3 

( c ) 

7xi 

+ 

2X2 

+ 

x 3 


*1 

+ 

2x2 

+ 

4x3 

( d ) 

*1 



= 

7 



*2 


= 

-2 


x 3 =3 
*4 = 4 


5 

-3 

7 


3x4 = 


5 

1 


12. In each part, find a system of linear equations corresponding to the given augmented matrix. 


( a ) 

2 

-f 



-4 

-6 



1 

-1 



3 

0 


( b ) 

'0 3 

-1 

-1 -1 


_5 2 

0 

-3 -6 

( c ) 

1 

2 

3 4 " 


-4 

-3 

-2 -1 


5 

-6 

1 1 


-8 

0 

0 3 

( d ) 

3 

0 1 

-4 3 


-4 

0 4 

1 -3 


-1 

3 0 

-2 -9 


0 

0 0 

-1 -2 


13. In each part, find the augmented matrix for the given system of linear equations. 



(a) -2xi = 6 

3xi = 8 

9xi= - 3 

(b) 6xi -X2 + 3 x3=4 

5x2-X3 = 1 

(c) 2x 2 -3x 4 + *5 

- 3xi - *2 + *3 

6xi + 2x2 — *3 + 2x4 “ 3x5 

(d) xi -x 5 = 7 
Answer: 

(a) —2 6 

3 8 

9 —3_ 

(b) "6 -1 3 4" 

.0 5 -1 1_ 

(c) f 0 2 0 -3 1 O' 

-3-1 1 0 0-1 

6 2-1 2-3 6 

(d) [1 0 0 0 -1 7] 

14. In each part, find the augmented matrix for the given system of linear equations. 

(a) 3xi -2x 2 = - 1 

4xi + 5x2 = 3 

7xi + 3x2 = 2 

(b) 2xi +2x3=1 

3xi — *2 + 4x3 = 7 
6 xi+X 2 — *3 = 0 

(c) xi + 2 x 2 -X 4 + X 5 =l 

3 x 2 + *3 — X5 = 2 

X 3 + 7 X 4 =1 

(d) *1 = 1 

X 2 =2 
*3 = 3 

15. The curve y = ax + bx 4 = c shown in the accompanying figure passes through the points 

(* 1 , y±) r (* 2 , y 2 ), and (* 3 , 73 )- Show that the coefficients a , b , and c are a solution of the system of 
linear equations whose augmented matrix is 

*1 *1 1 71 

*2 *2 1 72 
*3 *3 1 73 


= 0 

= 6 



Figure Ex-15 


16. Explain why each of the three elementary row operations does not affect the solution set of a linear system. 

17. Show that if the linear equations 

x\ + kx 2 = c and x\ +lx 2 = d 

have the same solution set, then the two equations are identical (i.e., k = \ an d c = d )• 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) A linear system whose equations are all homogeneous must be consistent. 

Answer: 

True 

(b) Multiplying a linear equation through by zero is an acceptable elementary row operation. 

Answer: 

False 

(c) The linear system 

*-7 = 3 

2x — 2y = k 

cannot have a unique solution, regardless of the value of k . 

Answer: 

True 

(d) A single linear equation with two or more unknowns must always have infinitely many solutions. 

Answer: 

True 

(e) If the number of equations in a linear system exceeds the number of unknowns, then the system must be 
inconsistent. 


Answer: 




(f) If each equation in a consistent linear system is multiplied through by a constant c, then all solutions to the 
new system can be obtained by multiplying solutions from the original system by c. 

Answer: 

False 

(g) Elementary row operations permit one equation in a linear system to be subtracted from another. 

Answer: 

True 

(h) The linear system with corresponding augmented matrix 

2 -1 

0 0 

is consistent. 

Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 




1.2 Gaussian Elimination 

In this section we will develop a systematic procedure for solving systems of linear equations. The procedure is based on 
the idea of performing certain operations on the rows of the augmented matrix for the system that simplifies it to a form 
from which the solution of the system can be ascertained by inspection. 


Considerations in Solving Linear Systems 

When considering methods for solving systems of linear equations, it is important to distinguish between large systems 
that must be solved by computer and small systems that can be solved by hand. For example, there are many applications 
that lead to linear systems in thousands or even millions of unknowns. Large systems require special techniques to deal 
with issues of memory size, roundoff errors, solution time, and so forth. Such techniques are studied in the field of 
numerical analysis and will only be touched on in this text. However, almost all of the methods that are used for large 
systems are based on the ideas that we will develop in this section. 


Echelon Forms 


In Example 6 of the last section, we solved a linear system in the unknowns x, y, and z by reducing the augmented matrix 
to the form 

'1 0 0 r 
0 10 2 
0 0 13 

from which the solution x = hy = 2,z = 3 became evident. This is an example of a matrix that is in reduced row 
echelon form. To be of this form, a matrix must have the following properties: 

If a row does not consist entirely of zeros, then the first nonzero number in the row is a 1. We call this a leading 1. 

If there are any rows that consist entirely of zeros, then they are grouped together at the bottom of the matrix. 

In any two successive rows that do not consist entirely of zeros, the leading 1 in the lower row occurs farther to the 
right than the leading 1 in the higher row. 

Each column that contains a leading 1 has zeros everywhere else in that column. 

A matrix that has the first three properties is said to be in row echelon form. (Thus, a matrix in reduced row echelon 
form is of necessity in row echelon form, but not conversely.) 


EXAMPLE 1 Row Echelon and Reduced Row Echelon Form 


The following matrices are in reduced row echelon form. 

0 0 
0 0 


The following matrices are in row echelon form but not reduced row echelon form. 


"1 

4 

-3 

7' 


"1 

1 

o ' 


'0 

1 

2 

6 

O ' 

0 

1 

6 

2 

, 

0 

1 

0 

, 

0 

0 

1 

-1 

0 

0 

0 

1 

5 


0 

0 

0 


0 

0 

0 

0 

1 


10 0 4 
0 10 7 
0 0 1-1 


■I 

'1 

0 

o' 


, 

0 

1 

0 

, 

- 

0 

0 

1 



01-201 
0 0 0 1 3 

0 0 0 0 0 

0 0 0 0 0 


















EXAMPLE 2 More on Row Echelon and Reduced Row Echelon Form 


As Example 1 illustrates, a matrix in row echelon form has zeros below each leading 1, whereas a matrix in 
reduced row echelon form has zeros below and above each leading 1. Thus, with any real numbers substituted for 
the *'s, all matrices of the following types are in row echelon form: 


'1 

* 

* 

*" 


'l 

* 

* 



'1 

* 

* 

* 

0 

1 

* 

* 


0 

1 

* 

* 


0 

1 

* 

* 

0 

0 

1 

* 

7 

0 

0 

1 

* 

7 

0 

0 

0 

0 

0 

0 

0 

1 


0 

0 

0 

0 


0 

0 

0 

0 


0 

1 

* 

* 

* 

* 

* 

* 

* 

* 

0 

0 

0 

1 

* 

* 

* 

* 

* 

* 

0 

0 

0 

0 

1 

* 

* 

* 

* 

* 

0 

0 

0 

0 

0 

1 

* 

* 

* 

* 

0 

0 

0 

0 

0 

0 

0 

0 

1 

* 


All matrices of the following types are in reduced row echelon form: 


- 1 

0 

0 

0 


1 

0 

0 

X 


1 

0 

# 

0 10 0 


# 

0 

0 


0 1 * * 

0 0 10 


0 0 1 * 

7 

0 0 0 0 

0 0 0 1 


0 0 0 0 


0 0 0 0 


0 1 
0 0 
0 0 
0 0 
0 0 


* 0 
0 1 
0 0 
0 0 
0 0 


0 0 
0 0 
1 0 
0 1 
0 0 


* * 
* * 
* * 
* * 

0 0 


0 * 
0 * 
0 * 
0 * 
1 * 


If, by a sequence of elementary row operations, the augmented matrix for a system of linear equations is put in reduced 
row echelon form, then the solution set can be obtained either by inspection or by converting certain linear equations to 
parametric form. Here are some examples. 

In Example 3 we could, if desired, express the 
solution more succinctly as the 4-tuple (3, -1, 0, 5). 


EXAMPLE 3 Unique Solution 

Suppose that the augmented matrix for a linear system in the unknowns x\, X 2 , * 3 , and X 4 has been reduced 
by elementary row operations to 

'1 0 0 0 3' 

0100-1 

0 0 10 0 

0 0 0 1 5 

This matrix is in reduced row echelon form and corresponds to the equations 

xi =3 

*2 = “I 

*3 =0 

X 4 = 5 

Thus, the system has a unique solution, namely, xj = 3, xj = — 1, X 3 = 0, * 4 = 5. 




















EXAMPLE 4 Linear Systems in Three Unknowns 


In each part, suppose that the augmented matrix for a linear system in the unknowns x, y, and z has been 
reduced by elementary row operations to the given reduced row echelon form. Solve the system. 



l 

O 

O 

o 


"10 3 -f 


"1 -5 1 4" 

(a) 

0 12 0 

0 0 0 1 

(b) 

0 1-4 2 

0 0 0 0 

(c) 

0 0 0 0 

0 0 0 0 


Solution 

The equation that corresponds to the last row of the augmented matrix is 

Ox + Oy + Oz = 1 

Since this equation is not satisfied by any values of x, y, and z, the system is inconsistent. 

(b) The equation that corresponds to the last row of the augmented matrix is 

Ox + Oy + Oz = 0 

This equation can be omitted since it imposes no restrictions on x, y, and z; hence, the linear system 
corresponding to the augmented matrix is 

x +3z = —1 

y - 4z = 2 

Since x and y correspond to the leading l's in the augmented matrix, we call these the leading 
variables. The remaining variables (in this case z) are called free variables. Solving for the leading 
variables in terms of the free variables gives 

x = — 1 — 3z 
y = 2 4 - 4z 

From these equations we see that the free variable z can be treated as a parameter and assigned an 
arbitrary value, t, which then determines values for x and y. Thus, the solution set can be represented 
by the parametric equations 

x = — 1 — 3t, 7 = 2 + 4*, z = t 

By substituting various values for t in these equations we can obtain various solutions of the system. 
For example, setting t = Q yields the solution 

x= -1, 7 = 2, z = 0 

and setting t — \ yields the solution 

x = - 4, 7 = 6 , z = 1 

(c) As explained in part (b), we can omit the equations corresponding to the zero rows, in which case the 
linear system associated with the augmented matrix consists of the single equation 

x-5y+z = 4 (1) 

from which we see that the solution set is a plane in three-dimensional space. Although 1 is a valid 
form of the solution set, there are many applications in which it is preferable to express the solution 
set in parametric form. We can convert 1 to parametric form by solving for the leading variable x in 
terms of the free variables y and z to obtain 

x=4+5 y-z 

From this equation we see that the free variables can be assigned arbitrary values, say y = s and z — U 
which then determine the value of x. Thus, the solution set can be expressed parametrically as 








x = 4 + 5 s — t, y = s , z — t 


(2) 


We will usually denote parameters in a 
general solution by the letters r,s,t 9 ... 9 but 
any letters that do not conflict with the names 
of the unknowns can be used. For systems 
with more than three unknowns, subscripted 
letters such as t\ 9 12, are convenient. 


Formulas, such as 2, that express the solution set of a linear system parametrically have some associated terminology. 


DEFINITION 1 

If a linear system has infinitely many solutions, then a set of parametric equations from which all solutions can 
be obtained by assigning numerial values to the parameters is called a general solution of the system. 


Elimination Methods 

We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row 
echelon form. Now we will give a step-by-step elimination procedure that can be used to reduce any matrix to reduced 
row echelon form. As we state each step in the procedure, we illustrate the idea by reducing the following matrix to 
reduced row echelon form. 


0 

0 

-2 

0 

7 

12 

2 

4 

-10 

6 

12 

28 

2 

4 

-5 

6 

-5 -1 


Step 1. Locate the leftmost column that does not consist entirely of zeros. 



”o 

0 -2 

0 7 

12 


2 

4 -10 

6 12 

28 


2 

4 -5 

6 -5 

-1 

L_ 

Leftmost nonzero column 



Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in 
Step 1. 


2 

4 

-10 

6 

12 

28' 


0 

0 

-2 

0 

7 

12 

«— The first and second rows in the preceding matrix were interchanged. 

2 

4 

-5 

6 

-5 

-1 









Step 3. If the entry that is now at the top of the column found in Step 1 is a, multiply the first row by 1 la in order to 
introduce a leading 1. 


1 2 -5 3 6 14 
0 0 -2 0 7 12 

2 4 -5 6 -5 -1 


The first row of the preceding matrix was multiplied by — . 


Step 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros. 


1 2 -5 3 6 14 

0 0 -2 0 7 12 

0 0 5 0 -17 -29 


<-2 times the first row of the preceding matrix was added to the third row. 


Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the submatrix that remains. Continue 
in this way until the entire matrix is in row echelon form. 


1 

0 

0 


I 

0 

0 


2 

0 

0 


2 

0 

0 


-5 3 6 14 

-2 0 7 12 

5 0 -17 -29 


t_ 


I. eft most non/cro column 
in the suhmatrix 


-5 3 6 14 

I 0 — 4 —6 

5 0-17 -29 


I 2 -5 3 6 

00 1 0 

.0 0 0 0 4 

1 2 -5 3 6 

00 I o -4 

.0 0 0 0 ^ 

f 

I 2 -5 3 6 

00 I o -4 

0 0 0 0 I 


14 

-6 

1_ 

leading 1 

14“ 

—6 

lj 

l eftmost non/cro column 
in the new submatrix 

14 “ 

-6 Hie first 

2 


The entire matrix is now in row echelon form. To find the reduced row echelon form we need the following additional 
step. 

Step 6. Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above 
to introduce zeros above the leading l’s. 

















1 2 -5 3 6 14 
0 0 1 0 0 1 
0 0 0 0 1 2 

1 2 -5 3 0 2' 
0 0 1 0 0 1 
0 0 0 0 1 2 

1 2 0 3 0 7' 

0 0 1 0 0 1 
0 0 0 0 1 2 


1 fa eS the ted row of the precedmg mate war added to the second row 


— 6 times the third row was added to the first row. 


5 times the second row was added to the first row. 


The last matrix is in reduced row echelon form. 


The procedure (or algorithm) we have just described for reducing a matrix to reduced row echelon form is called Gauss- 
Jordan elimination. This algorithm consists of two parts, a forward phase in which zeros are introduced below the 
leading Ts and then a backward phase in which zeros are introduced above the leading l’s. If only the forward phase is 
used, then the procedure produces a row echelon form only and is called Gaussian elimination. For example, in the 
preceding computations a row echelon form was obtained at the end of Step 5. 



Carl Friedrich Gauss (1777-1855) 



Although versions of Gaussian elimination were known much earlier, the power of the method 
was not recognized until the great German mathematician Carl Friedrich Gauss used it to compute the orbit of 
the asteroid Ceres from limited data. What happened was this: On January 1, 1801 the Sicilian astronomer 
Giuseppe Piazzi (1746-1826) noticed a dim celestial object that he believed might be a “missing planet.” He 
named the object Ceres and made a limited number of positional observations but then lost the object as it neared 
the Sun. Gauss undertook the problem of computing the orbit from the limited data using least squares and the 
procedure that we now call Gaussian elimination. The work of Gauss caused a sensation when Ceres reappeared 









a year later in the constellation Virgo at almost the precise position that Gauss predicted! The method was further 
popularized by the German engineer Wilhelm Jordan in his handbook on geodesy (the science of measuring 
Earth shapes) entitled Handbuch der Vermessungskunde and published in 1888. 

[Images: Granger Collection (Gauss); wikipedia (Jordan)] 


EXAMPLE 5 Gauss-Jordan Elimination 


Solve by Gauss-Jordan elimination. 

* 1 + 3 x 2 — 2*3 + 2*5 = 0 

2*i + 6*2 — 5*3— 2*4+ 4*5— 3*6 = — 1 
5*3 + 10*4 +15*6= 5 

2*i+ 6*2 + 8*4+ 4*5 + 18*6 = 6 


The augmented matrix for the system is 

"1 3 -2 0 2 0 0 

2 6 -5 -2 4 -3 -1 

0 0 5 10 0 15 5 

2 6 0 8 4 18 6 

Adding —2 times the first row to the second and fourth rows gives 

"1 3 -2 0 2 0 0 

0 0 —1 —2 0 —3 -1 

0 0 5 10 0 15 5 

0 0 4 8 0 18 6 


Multiplying the second row by -1 and then adding -5 times the new second row to the third row and -4 
times the new second row to the fourth row gives 


13-20200 
0 0 1 2 0 3 1 
0 0 0 0 0 0 0 
0 0 0 0 0 6 2 


Interchanging the third and fourth rows and then multiplying the third row of the resulting matrix by — 

6 


gives the row echelon form 


1 

3 

-2 

0 

2 

0 

0 

0 

0 

1 

2 

0 

3 

1 

0 

0 

0 

0 

0 

1 

1 

3 

0 

0 

0 

0 

0 

0 

0 


This completes the forward phase since there are zeros below the leading l's . 


Adding -3 times the third row to the second row and then adding 2 times the second row of the resulting 
matrix to the first row yields the reduced row echelon form 


1 3 0 4 2 0 0 

0 0 1 2 0 0 0 

0 0 0 0 0 1 ^ 

0 0 0 0 0 0 0 


This completes the backward phase since there are zeros above the leading l's . 


The corresponding system of equations is 












*1 + 3*2 +4*4 +2*5 

*3 + 2*4 


(3) 


= 0 
= 0 



Note that in constructing the linear system in 
3 we ignored the row of zeros in the 
corresponding augmented matrix. Why is this 
justified? 


Solving for the leading variables we obtain 

*1= — 3*2 — 4*4 — 2*5 

*3 = — 2*4 



Finally, we express the general solution of the system parametrically by assigning the free variables * 2 , * 4 , 
and *5 arbitrary values r, s, and t , respectively. This yields 


*1 = — 3 r — 4 s — 2 t, 


*2 = r, *3 = — 2 s, *4 = s, *5 = t. 



Homogeneous Linear Systems 

A system of linear equations is said to be homogeneous if the constant terms are all zero; that is, the system has the form 

<*11*1 +<*12*2 + —+ a \n*n =0 
« 21*1 +<* 22*2 + =0 

a m ixi+a m2 X2 + ... + a m „x n = 0 

Every homogeneous system of linear equations is consistent because all such systems have *1 = 0,*2 = 0,...,*„ = 0as 
a solution. This solution is called the trivial solutions if there are other solutions, they are called nontrivial solutions. 

Because a homogeneous linear system always has the trivial solution, there are only two possibilities for its solutions: 
The system has only the trivial solution. 

The system has infinitely many solutions in addition to the trivial solution. 

In the special case of a homogeneous linear system of two equations in two unknowns, say 

a\x +Z?i 7 = 0 (a\, b\ not both zero) 
a^x + b^y = 0 («2? ^2 not both zero) 

the graphs of the equations are lines through the origin, and the trivial solution corresponds to the point of intersection at 
the origin (Figure 1.2.1). 


i y 

tfj.x + b x y = 0 
x 

-► 

a n x + b 2 y = 0 


Only the trivial solution 


Infinitely many 


solutions 


X 


a x x + = 0 

and 

a^x + b-,y = 0 


Figure 1.2.1 

There is one case in which a homogeneous system is assured of having nontrivial solutions—namely, whenever the 
system involves more unknowns than equations. To see why, consider the following example of four equations in six 
unknowns. 

EXAMPLE 6 A Homogeneous System 

Use Gauss-Jordan elimination to solve the homogeneous linear system 

* 1 + 3x2 — 2*3 + 2*5 =0 

2*i + 6*2 — 5*3 — 2*a + 4*j — 3*6 = 0 

(4) 

5*3+10*4 +15*6 = 0 ’ 

2*i+ 6*2 + 8*4 + 4*5 + 18*6 = 0 


Observe first that the coefficients of the unknowns in this system are the same as those in 
Example 5; that is, the two systems differ only in the constants on the right side. The augmented matrix for 
the given homogeneous system is 


-2 

-5 

5 

0 


0 

-2 

10 

8 


0 

-3 

15 

18 


(5) 


( 6 ) 


which is the same as the augmented matrix for the system in Example 5, except for zeros in the last 
column. Thus, the reduced row echelon form of this matrix will be the same as that of the augmented 
matrix in Example 5, except for the last column. However, a moment’s reflection will make it evident that 
a column of zeros is not changed by an elementary row operation, so the reduced row echelon form of 5 is 

1 3 0 4 2 0 0 

0 0 1 2 0 0 0 

0 0 0 0 0 1 0 

0 0 0 0 0 0 0 

The corresponding system of equations is 

xi+ 3x2 + 4x4 + 2 x 5 = 0 

X3 + 2x4 = 0 

*6 = 0 

Solving for the leading variables we obtain 

xi = —3x2 — 4x4 — 2x5 

*3 = — 2*4 (7) 

*6 = 0 

If we now assign the free variables X 2 , X 4 , and X 5 arbitrary values r, s, and t, respectively, then we can 












express the solution set parametrically as 

x\ = —3r — 4s — 2t, X 2 = r, X 3 = — 2 s, X 4 = s, x$ = t, x$ = 0 
Note that the trivial solution results when r = s = t = 0- 


Free Variable in Homogeneous Linear Systems 

Example 6 illustrates two important points about solving homogeneous linear systems: 

Elementary row operations do not alter columns of zeros in a matrix, so the reduced row echelon form of the 
augmented matrix for a homogeneous linear system has a final column of zeros. This implies that the linear system 
corresponding to the reduced row echelon form is homogeneous, just like the original system. 

When we constructed the homogeneous linear system corresponding to augmented matrix 6, we ignored the row of 
zeros because the corresponding equation 

Oxi + 0*2 + 0*3 + 0 x 4 + Ox 5 + 0x$ = 0 

does not impose any conditions on the unknowns. Thus, depending on whether or not the reduced row echelon form 
of the augmented matrix for a homogeneous linear system has any rows of zero, the linear system corresponding to 
that reduced row echelon form will either have the same number of equations as the original system or it will have 
fewer. 

Now consider a general homogeneous linear system with n unknowns, and suppose that the reduced row echelon form of 
the augmented matrix has r nonzero rows. Since each nonzero row has a leading 1, and since each leading 1 corresponds 
to a leading variable, the homogeneous system corresponding to the reduced row echelon form of the augmented matrix 
must have r leading variables and n—r free variables. Thus, this system is of the form 

**! + 
x k 2 + 

*k r + 

where in each equation the expression £}() denotes a sum that involves the free variables, if any [see 7, for example]. In 
summary, we have the following result. 


£() = o 
£() = o 

£0 = o 


Free Variable Theorem for Homogeneous Systems 

If a homogeneous linear system has n unknowns, and if the reduced row echelon form of its augmented matrix 
has r nonzero rows, then the system has n - r free variables. 


Note that Theorem 1.2.2 applies only to 
homogeneous systems—a nonhomogeneous system 
with more unknowns than equations need not be 
consistent. However, we will prove later that if a 
nonhomogeneous system with more unknowns then 
equations is consistent, then it has in infinitely many 
solutions. 


Theorem 1.2.1 has an important implication for homogeneous linear systems with more unknowns than equations. 
Specifically, if a homogeneous linear system has m equations in n unknowns, and if m < n, then it must also be true that 
r<n (why?). This being the case, the theorem implies that there is at least one free variable, and this implies in turn that 
the system has infinitely many solutions. Thus, we have the following result. 


THEOREM 1.2.2 

A homogeneous linear system with more unknowns than equations has infinitely many solutions. 


In retrospect, we could have anticipated that the homogeneous system in Example 6 would have infinitely many 
solutions since it has four equations in six unknowns. 


Gaussian Elimination and Back-Substitution 


For small linear systems that are solved by hand (such as most of those in this text), Gauss-Jordan elimination (reduction 
to reduced row echelon form) is a good procedure to use. However, for large linear systems that require a computer 
solution, it is generally more efficient to use Gaussian elimination (reduction to row echelon form) followed by a 
technique known as back-substitution to complete the process of solving the system. The next example illustrates this 
technique. 

EXAMPLE 7 Example 5 Solved by Back-Substitution 

From the computations in Example 5, a row echelon form of the augmented matrix is 

'1 3 -2 0 2 0 O' 

0 0 1 2 0 3 1 

0 0 0 0 0 1 | 

0 0 0 0 0 0 0 

To solve the corresponding system of equations 

x\ +3x2 — 2*3 + 2 x 5 

*3 + 2x4 + 3x6 

*6 

we proceed as follows: 

Step 1. Solve the equations for the leading variables. 

xi = — 3x2 + 2x3 “ 2x5 
X3 = 1 — 2x4 — 3 x 6 

*6 = -J 

Step 2. Beginning with the bottom equation and working upward, successively substitute each equation 

into all the equations above it. 



Substituting X6 = i nto secon d equation yields 




xj = — 3x2 + 2x2 “ 2x5 
7:3 = — 2x4 



Substituting 7:3 = — 2*4 into the first equation yields 

x 1 = — 3 x 2 — 4*4 — 2*5 
7:3 = — 2*4 



Step 3. Assign arbitrary values to the free variables, if any. 

If we now assign X2, X4, and X5 the arbitrary values r, s, and t, respectively, the general solution is given by 
the formulas 

x\= - 3 r — 4 s — 2 t, X2 = r, 7:3 = — 2 s, X 4 = s, x$ = t, x§ = y 
This agrees with the solution obtained in Example 5. 


EXAMPLE 8 

Suppose that the matrices below are augmented matrices for linear systems in the unknowns x\, X2, X3, and 
X 4 . These matrices are all in row echelon form but not reduced row echelon form. Discuss the existence 
and uniqueness of solutions to the corresponding linear systems 


'1 

-3 

7 

2 

5" 


'l 

-3 

7 

2 

5" 


'l 

-3 

7 

2 

5' 

0 

1 

2 

-4 

1 

(b) 

0 

1 

2 

-4 

1 

(c) 

0 

1 

2 

-4 

1 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

0 

0 

1 


0 

0 

0 

0 

0 


0 

0 

0 

1 

0 


Solution 

The last row corresponds to the equation 

Ox 1 4 - 0x2 + 0x3 + 0*4 = 1 
from which it is evident that the system is inconsistent. 

The last row corresponds to the equation 

Oxi + 0x2 + 0x3 + 0x4 = 0 

which has no effect on the solution set. In the remaining three equations the variables xi, X 2 , and X 3 
correspond to leading l's and hence are leading variables. The variable X 4 is a free variable. With a 
little algebra, the leading variables can be expressed in terms of the free variable, and the free variable 
can be assigned an arbitrary value. Thus, the system must have infinitely many solutions. 

The last row corresponds to the equation 

X 4 = 0 

which gives us a numerical value for X4. If we substitute this value into the third equation, namely, 

X3 + 6x4 = 9 

we obtain X 3 = 9. You should now be able to see that if we continue this process and substitute the 
known values of X 3 and X 4 into the equation corresponding to the second row, we will obtain a unique 
numerical value for X2; and if, finally, we substitute the known values of X4, X3, and X2 into the 








equation corresponding to the first row, we will produce a unique numerical value for x\. Thus, the 
system has a unique solution. 


Some Facts About Echelon Forms 


There are three facts about row echelon forms and reduced row echelon forms that are important to know but we will not 
prove: 

Every matrix has a unique reduced row echelon form; that is, regardless of whether you use Gauss-Jordan elimination 

* 

or some other sequence of elementary row operations, the same reduced row echelon form will result in the end. 

Row echelon forms are not unique; that is, different sequences of elementary row operations can result in different 
row echelon forms. 

Although row echelon forms are not unique, all row echelon forms of a matrix A have the same number of zero rows, 
and the leading l's always occur in the same positions in the row echelon forms of A. Those are callled the pivot 
positions of A. A column that contains a pivot position is called a pivot column of A. 

EXAMPLE 9 Pivot Positions and Columns 


Earlier in this section (immediately after Definition 1) we found a row echelon form of 


to be 



0-2 0 
4 -10 6 
4-5 6 


7 12 

12 28 
-5 -1 


1 2 -5 3 6 14 

00 1 0 -6 

0 0 0 0 1 2 


The leading l’s occur in positions (row 1, column 1), (row 2, column 3), and (row 3, column 5). These are 
the pivot positions. The pivot columns are columns 1,3, and 5. 


Roundoff Error and Instability 

There is often a gap between mathematical theory and its practical implementation—Gauss-Jordan elimination and 
Gaussian elimination being good examples. The problem is that computers generally approximate numbers, thereby 
introducing roundoff errors, so unless precautions are taken, successive calculations may degrade an answer to a degree 
that makes it useless. Algorithms (procedures) in which this happens are called unstable. There are various techniques 
for minimizing roundoff error and instability. For example, it can be shown that for large linear systems Gauss-Jordan 
elimination involves roughly 50% more operations than Gaussian elimination, so most computer algorithms are based on 
the latter method. Some of these matters will be considered in Chapter 9. 






Concept Review 

Reduced row echelon form 
Row echelon form 
• Leading 1 
Leading variables 
Free variables 

General solution to a linear system 
Gaussian elimination 
Gauss-Jordan elimination 
Forward phase 
Backward phase 
Homogeneous linear system 
Trivial solution 
Nontrivial solution 

Dimension Theorem for Homogeneous Systems 
B ack- sub stitution 

Skills 

Recognize whether a given matrix is in row echelon form, reduced row echelon form, or neither. 

Construct solutions to linear systems whose corresponding augmented matrices that are in row echelon form or 
reduced row echelon form. 

Use Gaussian elimination to find the general solution of a linear system. 

Use Gauss-Jordan elimination in order to find the general solution of a linear system. 

Analyze homogeneous linear systems using the Free Variable Theorem for Homogeneous Systems. 


Exercise Set 1.2 


1. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. 


(a) 


(b) 


(c) 


(d) 


0 0 


0 0 0 


0 0 0 


3 1 
2 4 







(e) [l 2 0 3 O' 

0 0 110 

0 0 0 0 1 

0 0 0 0 0 

(f) fo o' 

0 0 

0 0 

(g) fl -7 5 5 

[o 13 2 

Answer: 

(a) Both 

(b) Both 

(c) Both 

(d) Both 

(e) Both 

(f) Both 

(g) Row echelon 

2. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. 

(a) 120 
0 1 0 
0 0 0 

(b) f 1 0 O' 

0 1 0 

0 2 0 

(c) f 1 3 4~ 

0 0 1 

0 0 0 

(d) fl 5 —3 

0 1 1 

0 0 0 

(e) f 1 2 3] 

0 0 0 

0 0 1 

(f) fl 2 3 4 5' 

10 7 13 

0 0 0 0 1 

0 0 0 0 0 

(g) f 1 -2 0 1 

_0 0 1-2 

3. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 
to the given reduced row echelon form. Solve the system. 



(a) [1 -3 4 7' 

0 12 2 

0 0 15 

(b) f 1 0 8 -5 6 " 

014-93 
0 0 1 12 

(c) 17-20 —8 —3"| 

0 0 11 6 5 

0 0 0 1 3 9 

0 0 0 0 0 0 

(d) [1 —3 7 1 

0 14 0 

0 0 0 1 

Answer: 

(a) *1 = - 37, * 2 = - 8 , x 3 = 5 

(b) *1 = 13i — 10, x 2 = 13t — 5, *3 = — t + 2 , *4 = t 

(c) x l = — 7s + 2t — 11, x 2 = s , *3 = —3t—4, *4 = — 3i + 9, x$ = t 

(d) Inconsistent 

4. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 
to the given reduced row echelon form. Solve the system. 

(a) fl 0 0 -3' 

0 10 0 

0 0 1 7 

(b) f 1 0 0 -7 8 ' 

0 10 3 2 
0 0 1 1-5 

(c) fl -6 0 0 3 -2" 

0 0 1 0 4 7 

0 0 0 1 5 8 

0 0 0 0 0 0 

(d) fl -3 0 0" 

0 0 10 

0 0 0 1 

In Exercises 5-8, solve the linear system by Gauss-Jordan elimination. 

5. xi +*2 + 2*3 = 8 

—*1—2*2+ 3*3 = 1 

3*i—7*2+ 4*3 = 10 


Answer: 

*1 = 3 , *2 = 1, *3 = 2 



6 . 2*i + 2*2 + 2*3 = 0 

—2*1 + 5*2 + 27:3 = 1 

8*i+*2 + 4*3 = -1 

7 . * — 7 + 2 z — w = — 1 

2* + 7 — 2z — 2w = —2 
- * + 2y - 4z + w = 1 

3 * - 3w = - 3 

Answer: 

* = * — 1, 7 = 2s, z = s, w =t 

8 . —2b + 3 c = 1 

3a + 6b — 3c = —2 

6(2 + 6£ + 3c = 5 

In Exercises 9-12, solve the linear system by Gaussian elimination. 

9. Exercise 5 

Answer: 

*1 = 3, *2=1, *3 = 2 

10. Exercise 6 

11. Exercise 7 

Answer: 


* = t — 1, 7 = 2s, z = s, w = t 

12. Exercise 8 

In Exercises 13-16, determine whether the homogeneous system has nontrivial solutions by inspection (without pencil 
and paper). 

13. 2 *i - 3*2+ 4*3 - *4 = 0 

7 *i+ *2 — 8*3 + 9*4 = 0 
2*i+ 8*2+ *3 — *4 = 0 

Answer: 

Has nontrivial solutions 

14. *1+3*2“ *3 = 0 

*2 — 8*3 = 0 

4*3 = 0 

15.011*1 +a 12*2+ 013*3 = 0 
021*1 + 022*2 4" 023*3 = 0 

Answer: 

Has nontrivial solutions 
16. 3*i - 2*2 = 0 
6*i —4*2 = 0 



In Exercises 17-24, solve the given homogeneous linear system by any method. 


17. 2*i + ^2 + ^3 = 0 

*1 4- 2^2 = 0 

* 2 + *3 = 0 

Answer: 

*1 = 0 , *2 = 0 , *3 = 0 

18. 2* - y - 3z = 0 
—* 4- 2y — 3z = 0 

* 4= y 4- 4z = 0 


19. 3*i 4-*2 4-*3 4“ *4= 0 
5*i “*2 4= *3 — *4 = 0 

Answer: 

*1= — s , *2= — t — s, *3 = 4s, *4 = ^ 

20. v + 3 m? — 2* = 0 
2a 4= v - 4m? 4- 3* = 0 
2a 4= 3v 4- 2m? - x = 0 

-4a - 3v + 5 m? - 4* = 0 

21 . 2*4~2y4-4z = 0 
m? — y — 3z = 0 

2m? 4= 3* 4= 7 4- z = 0 
—2w 4= x + 3j>/ — 2z = 0 

Answer: 


w = £, * = — t, y = t 7 z = 0 

22. *1 4- 3*2 4-*4 = 0 

* 14 - 4*2 4 ^ 2*3 = 0 

— 2*2 — 2*3 — *4 = 0 
2*i —4*2 4- *3 4- *4 = 0 
*1 — 2*2 — *3 4-*4 = 0 

23. 2/i — I2 + 3/3 + 4/4 = 9 

/i -2/34-7/4 = 11 

31 1 - 3/ 2 + / 3 + 5/4 = 8 

21 \ 4- /2 4- 4/3 4- 4/4 = 10 


Answer: 

/ l = - 1 , / 2 = 0 , / 3 = 1 , / 4 = 2 

24 . Z3 4 - Z4 4 - Z5 = 0 

— Zi — Z2 4 - 2Z3 — 3Z4 4 - Z5 = 0 
Zi 4 - Z2 — 2Z3 — Z5 = 0 

2 Zi 4 - 2Z2 — Z3 +Z^ = 0 


In Exercises 25-28, determine the values of a for which the system has no solutions, exactly one solution, or infinitely 



many solutions. 

25. x + 2y - 3 z = 4 

3x — y + 5z = 2 

4x 4 - 7 + (a 2 — 14 Jz = a 4 - 2 
Answer: 

If ^ = 4, there are infinitely many solutions; if a = _ 4, there are no solutions; if a ^ ±4, there is exactly one 
solution. 

26. * + 27 + z = 2 

2x — 27 + 3z = 1 

x 4- 2y — (a 2 — 3^z = a 

27. * 4- 2y = 1 

2x+|a 2 — 5 jiy = tx — 1 

Answer: 




2 1 3 

0 -2 -29 

3 4 5 

to reduced row echelon form without introducing fractions at any intermediate stage. 

33. Show that the following nonlinear system has 18 solutions if 0 < a < 2tt, 0 < 7 < 2tt, and 0 < 7 < 2tt. 

sin a 4 - 2 cos $ + 3 tan 7 = 0 
2 sin a + 5 cos j3 4 = 3 tan 7 * = 0 
—sin ct — 5 cos £ + 5 tan 7 = 0 


[Hint: Begin by making the substitutions * = sin a? y = cos /J, and z = tan n.] 

34. Solve the following system of nonlinear equations for the unknown angles a, (3, and y, where 0 < a < 2 tt, 
0 < (H < 2 tt, and 0 < 7 < tt. 

2 sin a — cos £ + 3 tan 7 = 3 
4 sin + 2 cos /? — 2 tan 7 = 2 
6 sine* — 3 cos /?+ tan 7 = 9 


35. Solve the following system of nonlinear equations for x, y, and z. 

2 , 2 , 2 

x +y + z 


2 2 1 2 

a —7 +2z = 


o 2 , 2 2 

2 x +7 — z = 


= 6 
2 
3 


[Hint: Begin by making the substitutions X = x 2 > Z= 7 2 , Z = z 2 -] 

Answer: 

x = ± 1 , 7 = ± ^ 3 , z = ± ^2 

36. Solve the following system for x, and z. 

I+2_4 = , 

x 7 z 

— +—+— = 0 

X y z 

—i+—+— = 5 

x y z 

37. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is the graph of the equation 
y = ax 3 + bx 2 + cx + < 2 ?. 


r 

20l- 
(0, 10) I 


(1.7) 


1 1 

1 1 1 

1 1 1 z 

-2 / 

-20 

(3,-H) 

6 

(4.-14) 


Figure Ex-37 


Answer: 






a = \, b= -6, c-2, d = \0 

38. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is given by the equation 
ax 2 + ay 2 ■¥ bx •¥ cy + d = 0 . 


id 

(-2.7) 

M-5) 


.t 


(4, -3) 


Figure Ex-38 

39. If the linear system 

a\x +&17 -¥c\z = 0 

a2X-b^y^C2Z = 0 

a^x + b^y — C3Z = 0 

has only the trivial solution, what can be said about the solutions of the following system? 

a\x^b\y -^c\z = 3 

$2X -b2y -hC2Z = 7 

a^x + b^y — cjz = 11 


Answer: 


The nonhomogeneous system will have exactly one solution. 

40* (a) If A is a 3 x 5 matrix, then what is the maximum possible number of leading l’s in its reduced row echelon form? 

(b) If B is a 3 x 6 matrix whose last column has all zeros, then what is the maximum possible number of parameters 
in the general solution of the linear system with augmented matrix B? 

(c) If C is a 5 x 3 matrix, then what is the minimum possible number of rows of zeros in any row echelon form of 
C? 


41* (a) Prove that if ad — be * Cf then the reduced row echelon form of 


a b 
c d 


is 


1 0 
0 1 


(b) Use the result in part (a) to prove that if ad — be *■ 0? then the linear system 

ax + by = k 
cx + dy = l 

has exactly one solution. 


42. Consider the system of equations 

ax + by = 0 

cx + dy = 0 

ex + fy = 0 

Discuss the relative positions of the lines ax + by = 0> cx + dy = 0 ? and ex 4= fy = 0 when (a) the system has 

only the trivial solution, and (b) the system has nontrivial solutions. 








43. Describe all possible reduced row echelon forms of 

(a) a b c 

d e f 

g h i 

(b) abed 

e f g h 

i j k l 

m n p q 

True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) If a matrix is in reduced row echelon form, then it is also in row echelon form. 

Answer: 

True 

(b) If an elementary row operation is applied to a matrix that is in row echelon form, the resulting matrix will still be in 
row echelon form. 

Answer: 

False 

(c) Every matrix has a unique row echelon form. 

Answer: 

False 

(d) A homogeneous linear system in n unknowns whose corresponding augmented matrix has a reduced row echelon 
form with r leading l's has n — r free variables. 

Answer: 

True 

(e) All leading l's in a matrix in row echelon form must occur in different columns. 

Answer: 

True 

(f) If every column of a matrix in row echelon form has a leading 1 then all entries that are not leading l's are zero. 
Answer: 

False 

(g) If a homogeneous linear system of n equations in n unknowns has a corresponding augmented matrix with a reduced 
row echelon form containing n leading 1 's, then the linear system has only the trivial solution. 

Answer: 


True 





(h) If the reduced row echelon form of the augmented matrix for a linear system has a row of zeros, then the system must 
have infinitely many solutions. 

Answer: 

False 

(i) If a linear system has more unknowns than equations, then it must have infinitely many solutions. 

Answer: 

False 
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1.3 Matrices and Matrix Operations 

Rectangular arrays of real numbers arise in contexts other than as augmented matrices for linear systems. In this 
section we will begin to study matrices as objects in their own right by defining operations of addition, subtraction, 
and multiplication on them. 


Matrix Notation and Terminology 

In Section 1.2 we used rectangular arrays of numbers, called augmented matrices , to abbreviate systems of linear 
equations. However, rectangular arrays of numbers occur in other contexts as well. For example, the following 
rectangular array with three rows and seven columns might describe the number of hours that a student spent studying 
three subjects during a certain week: 



Mon. 

Tues. 

Wed. 

Thurs. 

Fri. 

Sat. 

Sun. 

Math 

2 

3 

2 

4 

1 

4 

2 

History 

0 

3 

1 

4 

3 

2 

2 

Language 

4 

1 

3 

1 

0 

0 

2 


If we suppress the headings, then we are left with the following rectangular array of numbers with three rows and 
seven columns, called a “matrix”: 

"2 3 2 4 1 4 2' 

0 3 1 4 3 2 2 
4 13 10 0 2 

More generally, we make the following definition. 

DEFINITION 1 

A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix. 


A matrix with only one column is called a column 
vector or a column matrix , and a matrix with only 
one row is called a row vector or a row matrix. In 
Example 1, the 2 x 1 matrix is a column vector, the 
1 x4 matrix is a row vector, and the 1 x 1 matrix 
is both a row vector and a column vector. 


EXAMPLE 1 Examples of Matrices 


Some examples of matrices are 







[ 4 ] 


1 2" 


l 

1 

tn 

3 0 

.[210 -3], 

o ^ 1 

-1 4 


2 



0 0 0 



The size of a matrix is described in terms of the number of rows (horizontal lines) and columns (vertical lines) it 
contains. For example, the first matrix in Example 1 has three rows and two columns, so its size is 3 by 2 (written 
3 x 2)- In a size description, the first number always denotes the number of rows, and the second denotes the number 
of columns. The remaining matrices in Example 1 have sizes Ix4?3x3?2xl> and lxl? respectively. 


We will use capital letters to denote matrices and lowercase letters to denote numerical quantities; thus we might write 


A = 


2 1 7 

3 4 2 


or C = 


a b 
d & 


c 

/ 


When discussing matrices, it is common to refer to numerical quantities as scalars. Unless stated otherwise, scalars 
will be real numbers ; complex scalars will be considered later in the text. 


Matrix brackets are often omitted from 1x1 
matrices, making it impossible to tell, for example, 
whether the symbol 4 denotes the number “four” or 
the matrix [4]. This rarely causes problems because 
it is usually possible to tell which is meant from the 
context. 


The entry that occurs in row i and column j of a matrix A will be denoted by ay. Thus a general 3x4 matrix might be 
written as 


and a general m x n matrix as 



ll 

<312 

<313 

<3 14 

A = 

<221 

<*22 

<323 

<324 


<331 

<*32 

<333 

<334 


A = 


<311 

<312 - 

a \n 


<*21 

<322 ‘ 

a 2n 

(i) 




a ml 

<3?n2 

a mn 



When a compact notation is desired, the preceding matrix can be written as 

[“vUxH or K/] 


the first notation being used when it is important in the discussion to know the size, and the second being used when 
the size need not be emphasized. Usually, we will match the letter denoting a matrix with the letter denoting its 
entries; thus, for a matrix B we would generally use by for the entry in row i and columny, and for a matrix C we 
would use the notation cy. 


The entry in row i and column j of a matrix A is also commonly denoted by the symbol (A)y. Thus, for matrix 1 
above, we have 


(A)ij —ajj 



and for the matrix 



we have (A) n = 2, (A) 12 = -3, (A) 2 \ = 7, and (A) 22 = 0. 


Row and column vectors are of special importance, and it is common practice to denote them by boldface lowercase 
letters rather than capital letters. For such matrices, double subscripting of the entries is unnecessary. Thus a general 
1 x n row vector a and a general mx\ column vector b would be written as 


a = [^l a 2 ‘ ‘ ‘ a n ] 


and 


*1 


b = 


h 



A matrix^ with n rows and n columns is called a square matrix of order n, and the shaded entries a\\, < 222 , ■■■, a ym 
in 2 are said to be on the main diagonal of A. 


a\\ 

a 12 

0|„ 

021 

<722 

02/1 

O/i 1 

C*n2 • • * 

0„/i 


Operations on Matrices 

So far, we have used matrices to abbreviate the work in solving systems of linear equations. For other applications, 
however, it is desirable to develop an “arithmetic of matrices” in which matrices can be added, subtracted, and 
multiplied in a useful way. The remainder of this section will be devoted to developing this arithmetic. 


DEFINITION 2 

Two matrices are defined to be equal if they have the same size and their corresponding entries are equal. 


J 


The equality of two matrices 

A = and B= [6^] 

of the same size can be expressed either by writing 
(A) ij = (B) ij 

or by writing 

a iJ = ^ij 

where it is understood that the equalities hold for 
all values of i and j. 








EXAMPLE 2 Equality of Matrices 


Consider the matrices 



i 

oo 

II 

SCI 

1 

ISJ 

_1 


1 

ISJ 

1- 

o 

A = 

, c= 


LkJ 

3 5 


3 4 

0 


If x = 5, then A —but for all other values of v the matrices A and B are not equal, since not all of 
their corresponding entries are equal. There is no value of v for which A=C since A and C have 
different sizes. 


r 


DEFINITION 3 

If A and B are matrices of the same size, then the sum A 4 B is the matrix obtained by adding the entries of B 
to the corresponding entries of A , and the difference A — B is the matrix obtained by subtracting the entries of 
B from the corresponding entries of A. Matrices of different sizes cannot be added or subtracted. 


In matrix notation, if A = [fly ] and B = [6y ] have the same size, then 

(A + B)jj = (-d)y + (B) ij = fly + bjj and (A — 5)y = (-*4) y — C B)ij = a ij ™ j 

EXAMPLE 3 Addition and Subtraction 

Consider the matrices 



"2 10 3' 


"-4 3 5 1' 


'1 r 
2 2_ 

A = 

-1 0 2 4 

4-270 

, B = 

2 2 0 -1 

3 2-4 5 

, C = 


Then 


'-2 4 5 4' 


'6-2-5 2' 

12 2 3 

and A — B = 

-3-2 2 5 

7 0 3 5 


1 -4 11 -5 


The expressions A + C, B + C, A^C, and B —C are undefined. 


r 


DEFINITION 4 

If A is any matrix and c is any scalar, then the product cA is the matrix obtained by multiplying each entry of 
the matrix A by c. The matrix cA is said to be a scalar multiple of A. 


In matrix notation, if A = [ a y]> then 


(CA) 2j — C (-d) y — Cfly 


















EXAMPLE 4 Scalar Multiples 


For the matrices 


A = 

l 

m- 

oo 

CM 

i_ 

. B = 

0 2 

1 

, C = 

"9 

-6 

3' 

1 3 1 

-1 3 

-5 


3 

0 

12 


we have 


l 

00 

I_ 

, (-1)5 = 

"0 -2 -7' 

, 1 c= 

"3 -2 f 

_2 6 2_ 

1 -3 5_ 

’ 3 

O 

4^ 


It is common practice to denote (- 1 )B by —B. 


Thus far we have defined multiplication of a matrix by a scalar but not the multiplication of two matrices. Since 
matrices are added by adding corresponding entries and subtracted by subtracting corresponding entries, it would 
seem natural to define multiplication of matrices by multiplying corresponding entries. However, it turns out that such 
a definition would not be very useful for most problems. Experience has led mathematicians to the following more 
useful definition of matrix multiplication. 


DEFINITION 5 

If A is an ^ x r matrix and B is an r x n matrix, then the product AB is the ^ x n matrix whose entries are 
determined as follows: To find the entry in row i and column j of AB , single out row i from the matrix A and 
column j from the matrix B. Multiply the corresponding entries from the row and column together, and then 
add up the resulting products. 


EXAMPLE 5 Multiplying Matrices 

Consider the matrices 




'4 14 3' 

i- 

to —‘ 

to 

O 4^ 

to 

II 

0-131 

2 7 5 2 


Since A is a 2 x 3 matrix and B is a 3 x 4 matrix, the product AB is a 2 x 4 matrix. To determine, for 
example, the entry in row 2 and column 3 of AB, we single out row 2 from A and column 3 from B. 
Then, as illustrated below, we multiply corresponding entries together and add up these products. 


'l 2 4" 

"4 14 3' 


'□ □ 

□ □' 

.2 6 0_ 

0-131 

2 7 5 2 


□ □ 

nm □ 


(2 -4) + (6-3) + (0-5) = 26 


The entry in row 1 and column 4 of AB is computed as follows: 


























'l 2 4' 

"4 14 3' 

0-131 


’□ □ □ Ef 

_2 6 0_ 

2 7 5 2 


□ □ □ □. 


(1 -3) + (2- 1)+ (4-2) = 13 

The computations for the remaining entries are 

(1.4) + (2.0) + (4.2) = 12 
(1.1)-(2.1)+ (4.7) = 27 

(1.4) + (2.3) + (4.5) = 30 _ Ti2 27 30 13' 

(2.4) + (6.0)+ (0.2) = 8 _ [ 8 -4 26 12_ 

(2.1)-(6.1)+ (0.7) = -4 

(2.3)+ (6.1)+ (0.2) = 12 


The definition of matrix multiplication requires that the number of columns of the first factor A be the same as the 
number of rows of the second factor B in order to form the product AB. If this condition is not satisfied, the product is 
undefined. A convenient way to determine whether a product of two matrices is defined is to write down the size of 
the first factor and, to the right of it, write down the size of the second factor. If, as in 3, the inside numbers are the 
same, then the product is defined. The outside numbers then give the size of the product. 

A B AB 

m x r r x n = m x n 

Inside (3) 

Outside 



Gotthold Eisenstein (1823-1852) 


The concept of matrix multiplication is due to the German mathematician Gotthold 
Eisenstein, who introduced the idea around 1844 to simplify the process of making substitutions in linear 
systems. The idea was then expanded on and formalized by Cayley in his Memoir on the Theory of Matrices 
that was published in 1858. Eisenstein was a pupil of Gauss, who ranked him as the equal of Isaac Newton 
and Archimedes. However, Eisenstein, suffering from bad health his entire life, died at age 30, so his potential 
was never realized. 

[Image: wikipedia\ 











EXAMPLE 6 Determining Whether a Product Is Defined 


Suppose that A, B , and C are matrices with the following sizes: 

ABC 

3x4 4x7 7x3 

Then by 3, AB is defined and is a 3 x 7 matrix; BC is defined and is a 4 x 3 matrix; and CA is defined 
and is a 7 x 4 matrix. The products AC, CB , and BA are all undefined. 


AB = 


an^xr matrix and B = [6y ] is an r x n 

matrix, then, 

as illustrated by the shading in 4, 

'an 

«12 

• - * air 

_ 





a 2\ 

<*22 

• • • Ct2r 

*11 

*12 - 

• • *y • 

• • *ln 


; 

: 

: 

*21 

*22 - 

• • *2, • 

• • *2 m 

(4) 

an 

Oi2 

• • • o ir 

: 

: 

: 

a m\ 

m2 

• • • <*mr 

*rl 

*r2 ’ 

. . h ■ . 

°rj 

• • *171 



the entry ( AB) y in row i and column j of AB is given by 

(AB) jj = + a i3^3j + 


a< Y b 




(5) 


Partitioned Matrices 


A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal and vertical rules between 
selected rows and columns. For example, the following are three possible partitions of a general 3x4 matrix A —the 
first is a partition of A into four submatrices An, A 12 , ^21, and A 22 I the second is a partition of A into its row vectors 


1*1, F2, and ry and the third is a partition of A into its column vectors ci, C2, C3, and C4: 


an 

<*12 

«21 

«22 

a 3 i 

<*32 


a 13 a \4 
23 a 24 
a 33 a 34 


Ai A2 

^21 ^22 



an 

<*12 

aj3 

ai4 

A = 

«21 

<222 

«23 

<224 


a 31 

<*32 

a 33 

a 34 


r i 

r 2 

r 3 


an 

o\2 

aj3 

ai4 


«21 

<*22 

«23 

«24 

= t c l 

a 2\ 

<*32 

a 33 

a 34 



C 2 c 3 c 4 ] 


Matrix Multiplication by Columns and by Rows 

Partitioning has many uses, one of which is for finding particular rows or columns of a matrix product AB without 
computing the entire product. Specifically, the following formulas, whose proofs are left as exercises, show how 
individual column vectors of AB can be obtained by partitioning B into column vectors and how individual row 
vectors of AB can be obtained by partitioning A into row vectors. 
















AB = A[ bi b 2 • ■ ■ b„] = [^bi ^b 2 ■ ■ • 4b„] 

(AB computed column by column) 



a l " 


a \B 

AB = 

a 2 

B = 

a2 B 


a ra 


a m B 


(AB computed row by row) 


In words, these formulas state that 

j th column vector of AB = A[j th column vector of B ] (8) 


i th row vector of AB = [i th row vector of A] B 


(9) 


EXAMPLE 7 Example 5 Revisited 


If A and B are the matrices in Example 5, then from 8 the second column vector of AB can be obtained 
by the computation 


'1 2 

4' 


" 1 ' 
-1 

7 


'27" 

_2 6 

0 _ 



-4 


t T 


Second column of B Second column of AB 
and from 9 the first row vector of AB can be obtained by the computation 


[1 2 4] 


4 
3 

5 


— Fi 


jt 1 


[12 27 30 13] * 

First row of AB — 


Matrix Products as Linear Combinations 

We have discussed three methods for computing a matrix product AB —entry by entry, column by column, and row by 
row. The following definition provides yet another way of thinking about matrix multiplication. 

r n 


DEFINITION 6 


IfA\, A 2 , A r are matrices of the same size, and if c \, c 2 , c r are scalars, then an expression of the 















form 


C \A\ +C 2 A 2 + • • • +c r A r 

is called a linear combination of A\, A 2 , A r with coefficients c \, ^ 2 , c r . 


J 


To see how matrix products can be viewed as linear combinations, let A be an ^ x n matrix and x an ^ x 1 column 
vector, say 



‘<*11 

<*12 

• - - <*lw" 


■*r 

A = 

<*21 

<*22 

• ' ' <*2n 

and x = 

*2 


<*ml 

<*m2 

<*m« 


*M 


Then 


A = 


’<*11*1 

+ 

<*12*2 

+ ‘ ‘ 

. ^ 

a\ n x n 


‘<*n ‘ 


’<*12 " 



~a\ n ~ 

<*21*1 

+ 

<*22*2 

+ " • 

. + 

a 2 n x n 

=*i 

<*21 

A~X2 

<*22 

+ ’ ' 

' ■ +*« 

a 2n 

<*ml*l 

+ 

<*m2*2 

+ * * 

. ^ 



<*ml 


a m2 



a mn 


( 10 ) 


This proves the following theorem. 


THEOREM 1.3.1 

If A is an wixn matrix, and if xis an nx 1 column vector, then the product Ax can be expressed as a linear 
combination of the column vectors of A in which the coefficients are the entries of x. 


EXAMPLE 8 Matrix Products as Linear Combinations 

The matrix product 


-1 3 2] 

2" 


r 

1 2 -3 

-1 

= 

-9 

2 1 -2 

3 


-3 


can be written as the following linear combination of column vectors 


-1 


3 


2 


1 

1 

-1 

2 

+ 3 

-3 

= 

-9 

2 


1 


-2 


-3 


EXAMPLE 9 Columns of a Product AS as Linear Combinations 


We showed in Example 5 that 




"4 

i 

4 

3’ 





’1 2 

4' 

0 

3 

1 

_ 

’12 27 

30 

13’ 

-i 

2 6 

0 _ 




00 

1 

26 

12_ 



2 

7 

5 

2 





AB = 

































It follows from Formula 6 and Theorem 1.3.1 that the j th column vector of AB can be expressed as a 
linear combination of the column vectors of A in which the coefficients in the linear combination are the 
entries from the j th column of B. The computations are as follows: 

'12 

_ 8 

' 27 

-4 

'30 
26 

'13 
12 


= 4 


+ 0 


+ 2 


= 4 


+ 3 


+ 5 


= 3 


Matrix Form of a Linear System 

Matrix multiplication has an important application to systems of linear equations. Consider a system of m linear 
equations in n unknowns: 


<*11*1 

+ 

<*12*2 

+ " 

. . q= 

a\ n x n 

= b i 

<*21*1 

+ 

<*22*2 

+ ’ 

. . q= 

a 2n x n 

=h 

<*m 1*1 

+ 

<*m2*2 

+ ‘ 

. . q= 

a mn x n 

= b m 


Since two matrices are equal if and only if their corresponding entries are equal, we can replace the m equations in 
this system by the single matrix equation 


’<*11*1 

+ 

<*12*2 

+ • 

. . q= 

a \ n x n 


V 

<*21*1 

+ 

<*22*2 

+ " 

. . q. 

a 2 n x n 

= 

h 

<*ml*l 

+ 

<*m2*2 

+ • 

. . q= 



b m 


The m x 1 matrix on the left side of this equation can be written as a product to give 


’<*11 <*12 
<*21 <*22 

• • • <*1m" 

• • • <*2m 


1 

X X 

1_ 


~b f 

bi 

<*ml a m2 

<*m« 


x n 


b m 


If we designate these matrices by A, x, and b, respectively, then we can replace the original system of m equations in 
n unknowns has been replaced by the single matrix equation 

The matrix A in this equation is called the coefficient matrix of the system. The augmented matrix for the system is 
obtained by adjoining b to A as the last column; thus the augmented matrix is 


<*11 

<*12 * • 

a\ n 

b\ 

<*21 

<*22 ' ' 

a 2n 

b 2 

a ml 

<*m2 

a mn 

b m 


[A\b] = 























































The vertical bar in [^|b] is a convenient way to 
separate A from b visually; it has no mathematical 
significance. 


Transpose of a Matrix 

We conclude this section by defining two matrix operations that have no analogs in the arithmetic of real numbers. 


DEFINITION 7 

T 

If A is any mxn matrix, then the transpose of A, denoted by A , is defined to be the « x m matrix that results 

by interchanging the rows and columns of A ; that is, the first column of A is the first row of A, the second 

T 

column of A is the second row of A , and so forth. 


EXAMPLE 10 Some Transposes 


The following are some examples of matrices and their transposes. 

C=[ 1 3 5], D=[ 4] 


D l = [ 4] 



'<211 

ai2 

an 

<*14" 


'2 

3' 

A = 

<*21 

<*22 

<*23 

<*24 

, B = 

1 

4 


<* 31 

<*32 

<*33 

<*34 


5 

6 


A T = 


- 1 

& 

£ 

to 

& 

LO 




"l" 

a 12 <*22 <*32 

, S 7 = 

'2 1 5' 

, C r = 

3 

<*13 <*23 <*33 

<*14 <*24 <*34 

_3 4 6_ 


5 


T T 

Observe that not only are the columns of A the rows of A, but the rows of A are the columns of A. Thus the entry in 

T 

row i and column j of A is the entry in row j and column i of A; that is, 



do 


Note the reversal of the subscripts. 

In the special case where A is a square matrix, the transpose of A can be obtained by interchanging entries that are 

T 

symmetrically positioned about the main diagonal. In 12 we see that A can also be obtained by “reflecting” A about 
its main diagonal. 


( 12 ) 













1 -2 4 


1 

1 

1_ 


1 

1 

*J\ 

A = 

3 7 0 

-5 8 6 

— > 

3 7 0 

-5 8 6 

-+A T = 

-2 7 8 

4 0 6 


DEFINITION 8 

If A is a square matrix, then the trace of A, denoted by tr (A), is defined to be the sum of the entries on the 
main diagonal of A. The trace of A is undefined if A is not a square matrix. 


J 



James Sylvester (1814-1897) 



Arthur Cayley (1821-1895) 


The term matrix was first used by the English mathematician (and lawyer) James Sylvester, 
who defined the term in 1850 to be an “oblong arrangement of terms.” Sylvester communicated his work on 
matrices to a fellow English mathematician and lawyer named Arthur Cayley, who then introduced some of 
the basic operations on matrices in a book entitled Memoir on the Theory of Matrices that was published in 
1858. As a matter of interest, Sylvester, who was Jewish, did not get his college degree because he refused to 
sign a required oath to the Church of England. He was appointed to a chair at the University of Virginia in the 
United States but resigned after swatting a student with a stick because he was reading a newspaper in class. 











Sylvester, thinking he had killed the student, fled back to England on the first available ship. Fortunately, the 
student was not dead, just in shock! 

[Images: The Granger Collection, New York\ 


EXAMPLE 11 Trace of a Matrix 


The following are examples of matrices and their traces. 



'<*11 

<*12 

<* 13 " 

A = 

<*21 

<*22 

<*23 


<*31 

<*32 

<*33 



7 0 

—8 4 

7 —3 
1 0 


tr(.<4) =a\\ + &22 + a 33 tr(5) = —1 + 5 + 74-0 = 11 


In the exercises you will have some practice working with the transpose and trace operations. 


Concept Review 

Matrix 

Entries 

Column vector (or column matrix) 

Row vector (or row matrix) 

Square matrix 
Main diagonal 
Equal matrices 

Matrix operations: sum, difference, scalar multiplication 

Linear combination of matrices 

Product of matrices (matrix multiplication) 

Partitioned matrices 
Submatrices 
Row-column method 
Column method 
Row method 

Coefficient matrix of a linear system 
Transpose 


• Trace 

Skills 






Determine the size of a given matrix. 

Identify the row vectors and column vectors of a given matrix. 

Perform the arithmetic operations of matrix addition, subtraction, scalar multiplication, and multiplication. 
Determine whether the product of two given matrices is defined. 

Compute matrix products using the row-column method, the column method, and the row method. 

Express the product of a matrix and a column vector as a linear combination of the columns of the matrix. 
Express a linear system as a matrix equation, and identify the coefficient matrix. 

Compute the transpose of a matrix. 

Compute the trace of a square matrix. 


Exercise Set 1.3 

1. Suppose that A, B, C, D, and E are matrices with the following sizes: 

A B C D E 

(4x5) (4x5) (5x2) (4x2) (5x4) 

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size 
the resulting matrix. 

(a) BA 

(b) AC + & 

(c) AE + B 

(d ) AB + B 

(e) E(A + B) 

(f) E(AC) 

GO E r A 

(h) (a T + E ] jD 

Answer: 

(a) Undefined 

(b) 4 x 2 

(c) Undefined 

(d) Undefined 

(e) 5x5 

(f ) 5x2 

(g) Undefined 

(h) 5x2 


2. Suppose that A, B , C, D, and E are matrices with the following sizes: 


A B C D E 

(3x1) (3x6) (6x2) (2x6) (1x3) 


In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of 
the resulting matrix. 

(a) EA 

(b) AB t 

(c) B T {a + E T } 

(d) 2A | C 

(e) (c T -\- D'jB T 

(f) CD + B T E T 

(g) {BD r jC T 

(h) DC -| EA 


3. Consider the matrices 

1 5 2 
-1 0 1 
3 2 4 

In each part, compute the given expression (where possible). 

(a) D + E 

(b) D-E 

(c) 5 A 

(d) -TC 

(e) 2 B-C 

(f ) AE-2D 

(g) ~3(D + 2E) 

(h ) A-A 

(i) tr(D) 

(j) ft(D — 3E) 

(k) 4tr(75) 

(l) tr (A) 


A = 


3 0 
-1 2 
1 1 


B = 


4 -1 
0 2 


C = 


1 4 2 
3 1 5 


, D = 



1 3 
1 2 
1 3 


Answer: 


(a) 


(b) 



6 5 
1 3 
3 7 

4 -1 
-1 -1 
1 1 



(c) 15 0 

-5 10 

5 5 

(d) f -7 -28 -14 

-21 -7 -35 


(e) Undefined 


(f) 


22 -6 8 
-2 4 6 

10 0 4 


(g) —39 -21 -24 

9 -6 -15 

-33 -12 -30 

(h) [0 0‘ 

0 0 
0 0 


(i) 5 

(j) - 25 

(k) 168 

(l) Undefined 


4. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

(a) 2 A T + C 

(b) D T -B T 

(c) (. D-E) T 

(d) B T + 5C T 

(e) ic r -1,4 

2 4 

(f) B-B t 

(g) 2E t -3D t 

(h) (2E T -3D T) j T 

(i) ( CD)E 

O') C(BA) 

( k ) X\(DE T ) 

(l) tr(BC) 


5 . Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

(a) AB 

(b) BA 

(c) (3 E)D 

(d) (AB)C 

(e) A(BC) 

(f) CC T 



(g) 

(DAf 


(h) 

(c t b)a 

T 

(i) 

tr (DD T ) 


O') 

tr (4E T - 

° ) 

(k) 

«{c T J> T 

+ 2 E 

(1) 


i'-) 


Answer: 


(a) 

' 12 

-3" 



-4 

5 



4 

1 


(b) Undefined 


(C) 

"42 

108 

75" 


12 

—3 


21 


36 

78 

63 

(d) 

' 3 

45 

9 


11 

-11 

17 


7 

17 

13 

(e) 

' 3 

45 

9 


11 

-11 

17 


7 

17 

13 

(f) 

"21 

17" 




17 

35_ 



(g) 

' 0 

-2 


11" 


12 

1 


8_ 

(h) 

"12 


6 

9 


48 

-20 

14 


24 

8 

16 

(i) 61 




0) 35 




(k) 28 




(1) 99 





6. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

(a) {2D T -E^A 

(b ) (4B)C + 2B 

(c) (-AC) t +5D t 

(d) (ba t — 2Cy 



(e) B T (CC T -A T A ) J 

(f) D T E T -{BD) T 

7. Let 

"3 -2 7] [6 -2 4" 

A= 6 5 4 and B= 0 13 

0 4 9j |_7 7 5 

Use the row method or column method (as appropriate) to find 

(a) the first row of AB. 

(b) the third row of AB. 

(c) the second column of AB. 

(d) the first column of BA. 

(e) the third row of AA. 

(f) the third column of AA. 

Answer: 

(a) [67 41 41] 

(b) [63 67 57] 

(c) [41" 

21 

67 

(d) [ 6" 

6 

63 

(e) [24 56 97] 

(f) [76- 

98 

97 

8. Referring to the matrices in Exercise 7, use the row method or column method (as appropriate) to find 

(a) the first column of AB. 

(b) the third column of BB. 

(c) the second row of BB. 

(d) the first column of AA. 

(e) the third column of AB. 

(f) the first row of BA. 

9. Referring to the matrices A and B in Exercise 7, and Example 9, 

(a) express each column vectorof AA as a linear combination of the column vectors of A. 

(b) express each column vector of BB as a linear combination of the column vectors of B. 


Answer: 




10. Referring to the matrices A and B in Exercise 7, and Example 9, 

(a) express each column vector of AB as a linear combination of the column vectors of A. 

(b) express each column vector of BA as a linear combination of the column vectors of B. 

11. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
Ax = and write out this matrix equation. 

(a) 2xi -3x 2 + 5x 3 = 7 

9xi - *2 + X3 = - 1 

xi + 5x2 + 4x3 = 0 

(b) 4xi -3x3+ *4=1 

5xi+ *2 — 8 x 4 = 3 

2xj—5x2 + 9 x 3 — *4 = 0 

3x2 — *3 + 7x4 = 2 

Answer: 

(a) [2 -3 5lr*i] r T 

9-11 *2 = -1 

1 5 4j|/3j [ 0 

(b) [4 0-3 iir*i 

5 1 0 -8 *2 
2-5 9-1 *3 
03-17 [*4 

12. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
Ax = b’ and write out this matrix equation. 

(a) xi-2x 2 + 3x 3 =-3 

2 xi + *2 =0 

— 3x2 + 4x3 = 1 

*1 + X 3 = 5 

(b) 3xi+ 3 x 2 + 3 x 3 = -3 

— x 1 — 5 x 2 — 2 x 3 = 3 

— 4x2+ *3 = 0 

13. In each part, express the matrix equation as a system of linear equations. 

(a) [ 5 6 -7lr*il [ 2 ' 

_ 1_2 3 *2 = 0 

0 4 -lJH [3 




'1 

1 

r 

'*l' 


2' 

2 

3 

0 

*2 

= 

2 

5 

-3 

-6 

*3 


-9 


Answer: 





(a) 5xi 

+ 

6x2 

— 

7x 3 

-xi 


2x2 

+ 

3x3 



4x 2 

— 

X3 

(b) X! 

+ 

X2 

4= 

X3 

2xi 

4= 

3x2 



5xi 

— 

3x2 

— 

6x3 


14. In each part, express the matrix equation as a system of linear equations. 


(a) 

-1 

L»0 

to 

'xf 


2 



4 3 7 

X2 

= 

-1 



-2 1 5 

X3 


4 


(b) 

3-2 0 

f 

"w" 


o' 


5 0 2 

-2 

X 


0 


3 1 4 

7 

7 


0 


-2 5 1 

6 

z 


0 


In Exercises 15-16, find all values of k , if any, that satisfy the equation. 


15. 

'1 1 O' 

k 

[* 1 1] 

1 0 2 

1 


CO 

1 

Ov] 

0 

1 


Answer: 


-1 


16. 

O 

CM 

"2‘ 

[2 2 k] 

2 0 3 

2 


-1 

CO 

0 

k 


In Exercises 17-18, solve the matrix equation for a , b , c, and d. 


a 3 


4 

d — 2c 

— 1 a-kb 


d + 2c 

-2 


Answer: 


a = 4, b= —6, c= — 1, d = 1 


a — b b + a 


"8 r 

3d -b c 2d — c 


7 6_ 


19. Let ^ be any ^ x n matrix and let 0 be the yn x n matrix each of whose entries is zero. Show that if kA = 0 ? then 
k = 0 or A = 0- 

(a) Show that if AB and BA are both defined, then AB and BA are square matrices. 

(b) Show that if A is an m x n matrix and A{BA) is defined, then B is an n x m matrix. 



21 . Prove: If A and B are ^ x n matrices, then ti*(^4 + E) = tr(A) 4= tr(5). 

(a) Show that if A has a row of zeros and B is any matrix for which AB is defined, then AB also has a row of 
zeros. 

(b) Find a similar result involving a column of zeros. 

23. In each part, find a 6 x 6 matrix [a z y] that satisfies the stated condition. Make your answers as general as possible 
by using letters rather than specific numbers for the nonzero entries. 

(a) 3y = 0 l *J 

(b) 3y = 0 if i>j 

(c) 3 y = 0 if *<J 


(d) a tJ = 

0 if 

1 ’- 

A> 

1 


Answer: 






(a) 

an 

0 

0 

0 

0 

0 


0 

<*22 

0 

0 

0 

0 


0 

0 

333 

0 

0 

0 


0 

0 

0 

344 

0 

0 


0 

0 

0 

0 

355 

0 


0 

0 

0 

0 

0 

366 

(b) 

"an 

312 

313 

314 

315 

316 


0 

322 

323 

324 

325 

326 


0 

0 

333 

3 34 

335 

336 


0 

0 

0 

344 

345 

346 


0 

0 

0 

0 

355 

356 


0 

0 

0 

0 

0 

366 

(c) 

an 

0 

0 

0 

0 

0 


<221 

322 

0 

0 

0 

0 


a3i 

332 

333 

0 

0 

0 


a 4 i 

342 

343 

344 

0 

0 


<*51 

352 

353 

354 

355 

0 


<*61 

362 

363 

364 

365 

366 

(d) 

an 

312 

0 

0 

0 

0 


<*21 

322 

323 

0 

0 

0 


0 

332 

333 

334 

0 

0 


0 

0 

343 

344 

345 

0 


0 

0 

0 

354 

355 

356 


0 

0 

0 

0 

365 

366 


24. Find the 4 x 4 matrix A = [ay] whose entries satisfy the stated condition, 
(a) ay = i+j 



(c) _( 1 if 

^-|-1 if |i-j|<l 

25. Consider the function y — j (*) defined for 2 x 1 matrices x by y — Ax, where 


Plot/(x) together withx in each case below. How would you describe the action of“/? 

“'■(!) 

(b) __ f2\ 

[oj 


(c) x= 











2 


fix) 


r 


26. Let / be the « x « matrix whose entry in row i and column j is 

f 1 if i=j 

\0 if i*j 

Show that AI = 1A = A for every ^ x n matrix A. 

27. How many 3x3 matrices A can you find such that 


~x ' 


'x+7' 

y 

— 

x-y 

z 


0 


for all choices of x, y, and z? 

Answer: 


1 1 0 

1 -1 0 

0 0 0 


One; namely, A = 

28. How many 3x3 matrices A can you find such that 



~x ‘ 



A 

y 

= 

0 


z 


0 


for all choices of x, y, and z? 

29. A matrix B is said to be a square root of a matrix A if BE = A- 

"2 2 

2 2 

(b) 


( a) 

v J Find two square roots of A = 


How many different square roots can you find of A = 

(c) Do you think that every 2x2 matrix has at least one square root? Explain your reasoning. 


5 0 
0 9 


Answer: 


(a) 

1 f 

and 

"-1 -Y 


1 1 


-1 -1 


(b) F 

Four; 

{5 0j 

~(5 Oj 

{l 0 

-{5 0 


0 3 ’ 

0 3 ’ 

0 -3 ’ 

0 -3 


30. Let 0 denote a 2 x 2 matrix, each of whose entries is zero. 

(a) Is there a 2 x 2 matrix A such that A* 0 an d AA = Q? Justify your answer. 

(b) Is there a 2 x 2 matrix A such that A* 0 an( i AA = A ? Justify your answer. 






























True-False Exercises 


In parts (a)-(o) determine whether the statement is true or false, and justify your answer. 

^ The matrix 

Answer: 

True 

(b) An^x« matrix has m column vectors and n row vectors. 

Answer: 

False 

(c) If A and B are 2 x 2 matrices, then AB — £A- 
Answer: 

False 

(d) The i th row vector of a matrix product AB can be computed by multiplying A by the ith row vector of B. 
Answer: 

False 

^ For every matrix ^4, it is true that = A. 

Answer: 

True 

(f) If A and B are square matrices of the same order, then tr (AB) = tr(^4)tr(5). 

Answer: 

False 

(g) If A and B are square matrices of the same order, then (AB) ^ = A 1 B 1 . 

Answer: 

False 

(h) p or ever y square matrix A, it is true that tr (-4 ^ J = tr (-4). 

Answer: 

True 

(*) If A is a 6 x 4 matrix and B is an ^ x n matrix such that B T A T is a 2 x 6 matrix, then m = A and n = 2- 


1 2 3 
4 5 6 


has no main diagonal. 


Answer: 




True 

(j) If A is an n x n matrix and c is a scalar, then tr(c^4) = c tr(^4). 

Answer: 

True 

(k) If A, B, and C are matrices of the same size such that A — C = B — C-> then A = B- 
Answer: 

True 

(l) If A, B, and C are square matrices of the same order such that AC = BC-> then A = 27- 
Answer: 

False 

(m) If AB | BA is defined, then A and B are square matrices of the same size. 

Answer: 

True 

(n) If B has a column of zeros, then so does AB if this product is defined. 

Answer: 

True 

(o) If B has a column of zeros, then so does BA if this product is defined. 

Answer: 

False 
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1.4 Inverses; Algebraic Properties of Matrices 

In this section we will discuss some of the algebraic properties of matrix operations. We will see that many of 
the basic rules of arithmetic for real numbers hold for matrices, but we will also see that some do not. 


Properties of Matrix Addition and Scalar Multiplication 

The following theorem lists the basic algebraic properties of the matrix operations. 


Properties of Matrix Arithmetic 

Assuming that the sizes of the matrices are such that the indicated operations can be performed, the 
following rules of matrix arithmetic are valid. 

A 4- B = B + A (Commutative law for addition) 

(]j) A + (5 + C) = (A + B) + C (Associative law for addition) 
f c j A(BC) = (AB)C (Associative law for multiplication) 

(P) A(B + C) = AB + AC (Left distributive law) 

(B -H C)A = BA 4- CA (Right distributive law) 

(p A(B —C)=AB — AC 
(g ) ( B-C)A = BA-CA 
(hj a(B + C)=aB + aC 
(i) a{B — C)=aB — aC 
q) (a + b)C = aCA-bC 

(k ) (a — b)C = aC — bC 

( l) a(bC) = (ab)C 

(m ) a(BC) = (aB)C = B(aC) 


To prove any of the equalities in this theorem we must show that the matrix on the left side has the same size 
as that on the right and that the corresponding entries on the two sides are the same. Most of the proofs follow 
the same pattern, so we will prove part (d) as a sample. The proof of the associative law for multiplication is 
more complicated than the rest and is outlined in the exercises. 

There are three basic ways to prove that two 
matrices of the same size are equal—prove that 
corresponding entries are the same, prove that 
corresponding row vectors are the same, or 
prove that corresponding column vectors are 
the same. 


Proof (d) We must show that A(B 4 C) and AB 4 AC have the same size and that corresponding entries 
are equal. To form A(B 4 C), the matrices B and C must have the same size, say mxn, and the matrix A 
must then have m columns, so its size must be of the form rxm- This makes A(B 4 C) an rxn matrix. It 
follows that AB \ AC is also an rxn matrix and, consequently, A(B 4 C) and AB | AC have the same size. 

Suppose that A = ], B = [&y ] ,and C = [c,y ]. We want to show that corresponding entries of 

A(B 4 C) and AB 4 AC are equal; that is, 

[A(B + C)] ij =[AB + AC] ij 

for all values of i and j. But from the definitions of matrix addition and matrix multiplication, we have 

[A(B+C)]jj = an (b\j -H Ciy) + “b c 2j) H" " " " ^ ^ c mj) 

= (flublj + "b " " " + + a i2 c 2j + " " " + a im c mj ) 

= [AB] iJ +[AC] iJ =[AB + AC] iJ 


Although the operations of matrix addition and matrix multiplication were defined for pairs of 
matrices, associative laws ( b ) and (c) enable us to denote sums and products of three matrices as A \ B | C 
and ABC without inserting any parentheses. This is justified by the fact that no matter how parentheses are 
inserted, the associative laws guarantee that the same end result will be obtained. In general, given any sum or 
any product of matrices, pairs ofparentheses can be inserted or deleted anywhere within the expression 
without affecting the end result. 


EXAMPLE 1 Associativity of Matrix Multiplication 


As an illustration of the associative law for matrix multiplication, consider 

1 2 ' 


Then 


Thus 


and 


AB = 


A = 


3 4 
0 1 


B = 


4 3 
2 1 


C = 


1 0 
2 3 


"1 2' 



CO 

3 4 

0 1 

4 3 
_2 1 

= 

20 13 

2 1 


and BC = 


"4 3' 

-1 

O 


'10 9' 

i 

CM 

_2 3_ 


. 4 3 . 


(AB) C = 

00 o 

CM 

i_ 

5 

13 

'1 

2 

O' 

3 

= 

l 

00 

15' 

39 


2 

1 




4 

3 


A(BC) = 

'1 2' 

3 4 

'10 9' 
4 3 

= 

'18 15' 
46 39 


0 1 



4 3 


so ( AB)C = A(BC), as guaranteed by Theorem 1.4.1(c). 
































Properties of Matrix Multiplication 


Do not let Theorem 1.4.1 lull you into believing that all laws of real arithmetic carry over to matrix 
arithmetic. For example, you know that in real arithmetic it is always true that ab = ba, which is called the 
commutative law for multiplication. In matrix arithmetic, however, the equality of AB and BA can fail for 
three possible reasons: 

AB may be defined and BA may not (for example, if A is 2 x 3 and B is 3 x 4)- 

AB and BA may both be defined, but they may have different sizes (for example, if A is 2 x 3 and B is 

3x2). 

AB and BA may both be defined and have the same size, but the two matrices may be different (as 
illustrated in the next example). 

Do not read too much into Example 2—it does 
not rule out the possibility that AB and BA may 
be equal in certain cases, just that they are not 
equal in all cases. If it so happens that 
A£ = BA-> then we say that AB and BA 
commute. 


EXAMPLE 2 Order Matters in Matrix Multiplication 


Consider the matrices 


Multiplying gives 


A 


-1 0 
2 3 


and B = 


2 

0 



and BA = 


3 6 
-3 0 


Thus, ab * BA- 


Zero Matrices 


A matrix whose entries are all zero is called a zero matrix. Some examples are 


0 0 
0 0 


0 0 0 
0 0 0 
0 0 0 


0 0 0 0 
0 0 0 0 


0 

0 

0 

0 


[ 0 ] 


We will denote a zero matrix by 0 unless it is important to specify its size, in which case we will denote the 
m x n zero matrix by 0 mxM . 


















It should be evident that if A and 0 are matrices with the same size, then 

A+0=0+A=A 

Thus, 0 play s the same role in this matrix equation that the number 0 plays in the numerical equation 
c2d“0 = 0+ <2=r2- 


The following theorem lists the basic properties of zero matrices. Since the results should be self-evident, we 
will omit the formal proofs. 


Properties of Zero Matrices 

If c is a scalar, and if the sizes of the matrices are such that the operations can be perfomed, then: 

(a) AA s 0 = 0A-A = A 

(b) A-0 = A 

(c) A-A = A+(-A) = 0 

(d) 0A=0 

(e) If cA = 0, then c = 0 or A = 0- 


Since we know that the commutative law of real arithmetic is not valid in matrix arithmetic, it should not be 
surprising that there are other rules that fail as well. For example, consider the following two laws of real 
arithmetic: 

If ab = be and a ^ 0> then b = c■ [The cancellation law] 

If ab = 0) then at least one of the factors on the left is 0. 

The next two examples show that these laws are not universally true in matrix arithmetic. 

EXAMPLE 3 Failure of the Cancellation Law 


Consider the matrices 


A = 


0 

0 




5 

4 


We leave it for you to confirm that 


AB = AC = 


3 4 
6 8 


Although A * 0, canceling A from both sides of the equation AB = AC would lead to the 
incorrect conclusion that B = C- Thus, the cancellation law does not hold, in general, for matrix 
multiplication. 


EXAMPLE 4 A Zero Product with Nonzero Factors 










Here are two matrices for which AB = 0, but 0 and B ^ 0- 


A = 


0 1 
0 2 



7 

0 


Identity Matrices 


A square matrix with 1 's on the main diagonal and zeros elsewhere is called an identity matrix. Some 
examples are 


m. 


1 0 
0 1 


1 0 0 
0 1 0 
0 0 1 


10 0 0 
0 10 0 
0 0 10 
0 0 0 1 


An identity matrix is denoted by the letter I. If it is important to emphasize the size, we will write I n for the 
n x n identity matrix. 


To explain the role of identity matrices in matrix arithmetic, let us consider the effect of multiplying a general 
2x3 matrix A on each side by an identity matrix. Multiplying on the right by the 3 x 3 identity matrix yields 


AI 3 = 


a n 

*21 


<*12 *13 
*22 *23 


1 0 0 
0 1 0 
0 0 1 


an *12 a 12 

<*21 *22 *23 


and multiplying on the left by the 2x2 identity matrix yields 

hA = 


'1 

o ' 

~ a n 

a 12 

< 313 ' 


' O ’ 11 

*12 

* 13 ' 

_0 

1 _ 

_«21 

“22 

« 23 _ 


_<*21 

*22 

*23 _ 


= A 


The same result holds in general; that is, if A is any ^ x n matrix, then 

Al n = A and l m A = A 

Thus, the identity matrices play the same role in these matrix equations that the number 1 plays in the 
numerical equation a • 1 = 1 • a = a- 


As the next theorem shows, identity matrices arise naturally in studying reduced row echelon forms of square 
matrices. 


THEOREM 1.4.3 

If R is the reduced row echelon form of an n x n matrix A, then either R has a row of zeros or R is the 
identity matrix I n . 
























Suppose that the reduced row echelon form of A is 



>11 

r\ 2 

. . . 

r \n 

R = 

r 2\ 

••• to 
to 


nn 


r n\ 

r n2 

• . . 

r nn 


Either the last row in this matrix consists entirely of zeros or it does not. If not, the matrix contains no zero 
rows, and consequently each of the n rows has a leading entry of 1. Since these leading l's occur 
progressively farther to the right as we move down the matrix, each of these l's must occur on the main 
diagonal. Since the other entries in the same column as one of these l's are zero, R must be /„. Thus, either R 
has a row of zeros or R = l n . 


Inverse of a Matrix 

In real arithmetic every nonzero number a has a reciprocal a -1 ( = 1 / a) with the property 

a ■ =a~^ • a = 1 

The number is sometimes called the multiplicative inverse of a. Our next objective is to develop an 
analog of this result for matrix arithmetic. For this purpose we make the following definition. 


DEFINITION 1 

If A is a square matrix, and if a matrix B of the same size can be found such that AB = BA = L then A 
is said to be invertible (or nonsingular ) and B is called an inverse of A. If no such matrix B can be 
found, then A is said to be singular. 


The relationship AB = BA = / is not changed by interchanging A and 5, so if A is invertible and B 
is an inverse of A, then it is also true that B is invertible, and A is an inverse of B. Thus, when 


AB = BA = l 

we say that A and B are inverses of one another. 


EXAMPLE 5 An Invertible Matrix 


Let 



and B = 


5 

2 


Then 








AB 

BA 


2 • 

-5' 

'3 5' 


'1 O' 

-1 

3 _ 

1 2 


.0 1_ 

m 

CO 

2 -5' 


'1 O' 

_! 2_ 

-1 3 _ 


.0 


Thus, A and B are invertible and each is an inverse of the other. 


EXAMPLE 6 Class of Singular Matrices 


In general, a square matrix with a row or column of zeros is singular. To help understand why 
this is so, consider the matrix 


A = 


1 4 

2 5 

3 6 


0 

0 

0 


To prove that A is singular we must show that there is no 3 x 3 matrix B such that AB = BA = / 
. For this purpose let c i, C 2 , 0 be the column vectors of A. Thus, for any 3x3 matrix B we 
can express the product BA as 

BA = B[c\ C 2 0] = [5ci Z?C 2 0] [Formula (6) of Section 1.3] 

The column of zeros shows that BA * / and hence that A is singular. 


Properties of Inverses 

It is reasonable to ask whether an invertible matrix can have more than one inverse. The next theorem shows 
that the answer is no —an invertible matrix has exactly one inverse. 


THEOREM 1.4.4 

If B and C are both inverses of the matrix A, then B = C- 


Since B is an inverse of A, we have BA = /• Multiplying both sides on the right by C gives 
{BA) C = 1C = C. But it is also true that {BA) C = B{AC) = Bl = B, so Q = B- 


As a consequence of this important result, we can now speak of “the” inverse of an invertible matrix. If A is 
invertible, then its inverse will be denoted by the symbol J[ ■ Thus, 

AA~ X =l and A~ l A = I 


( 1 ) 

















The inverse of A plays much the same role in matrix arithmetic that the reciprocal a * plays in the numerical 
relationships aa~^ = 1 and a~^a = 1- 

In the next section we will develop a method for computing the inverse of an invertible matrix of any size. 
For now we give the following theorem that specifies conditions under which a 2 x 2 matrix is invertible and 
provides a simple formula for its inverse. 


THEOREM 1.4.5 

The matrix 

A = \ a h 

[c d_ 

is invertible if and only if ad — be ^ 0? m which case the inverse is given by the formula 

d ~ b 1 
ad — be |_ ~ c a _ 


( 2 ) 


We will omit the proof, because we will study a more general version of this theorem later. For now, you 
should at least confirm the validity of Formula 2 by showing that AA _1 = A“ 1 A = l 


The formula for A 1 given in Theorem 1.4.5 first appeared (in a more general 
form) in Arthur Cayley's 1858 Memoir on the Theory of Matrices. The more general result that 
Cayley discovered will be studied later. 


The quantity ad — be i n Theorem 1.4.5 is 
called the determinant of the 2x2 matrix A 
and is denoted by 

det(-d) = ad — be 
or alternatively by 

a ^ = ad — be 
e d 


Figure 1.4.1 illustrates that the determinant of a 2 x 2 matrix^ is the product of the entries on its 
main diagonal minus the product of the entries off its main diagonal. In words, Theorem 1.4.5 states that a 
2x2 matrix A is invertible if and only if its determinant is nonzero, and if invertible, then its inverse can be 
obtained by interchanging its diagonal entries, reversing the signs of its off-diagonal entries, and multiplying 
the entries by the reciprocal of the determinant of A. 








= ad - be 


det(A) = 


lt/y 


Figure 1.4.1 


EXAMPLE 7 Calculating the Inverse of a 2 x 2 Matrix 


In each part, determine whether the matrix is invertible. If so, find its inverse. 


< a >,4= 
< b >,4 = 


6 1 

5 2_ 

-1 2 

3 -6 


Solution 


The determinant of A is det(j4) = (6) (2) — (1) (5) = 7, which is nonzero. Thus, A is 
invertible, and its inverse is 


A~'=± 


2 

-5 


-1 

6 


2 

7 

5 

'7 


I 

'7 

6 

7 


We leave it for you to confirm that AA ^ = A ^ A = I- 
) The matrix is not invertible since det(y4) = ( — !)(—6) — (2)(3) = 0. 


EXAMPLE 8 Solution of a Linear System by Matrix Inversion 


A problem that arises in many applications is to solve a pair of equations of the form 

u = ax + by 
v = cx -¥dy 


for x and y in terms of it and v. One approach is to treat this as a linear system of two equations in the 
unknowns x and y and use Gauss-Jordan elimination to solve for x and y. However, because the 
coefficients of the unknowns are literal rather than numerical, this procedure is a little clumsy. As an 
alternative approach, let us replace the two equations by the single matrix equation 


r«i - 

ax + by 

LvJ- 

cx A-dy 


which we can rewrite as 


\ u ] - 

b 

~x~ 

kli 

[e d 

y 


If we assume that the 2x2 matrix is invertible (i.e., ad — be 0)- then we can multiply through on 
the left by the inverse and rewrite the equation as 

















which simplifies to 


a b~\ 

'Ml 

r<2 b 

-1 

a 

b 

~x~ 

c cafj 

w-| 

\_c d _ 


c 

d 

y 


'a b~\ 

\ x ' 

c d\ 

L v J _ [ y . 


Using Theorem 1.4.5, we can rewrite this equation as 


from which we obtain 


i 

d 

-b If 


ad — be 

—c 

a\[ 


du — 

bv 


av — cu 

X = , 

ad — 

be ’ 

y = 

ad — be 


The next theorem is concerned with inverses of matrix products. 

THEOREM 1.4.6 

If A and B are invertible matrices with the same size, then AB is invertible and 

(AB)~ l 


We can establish the invertibility and obtain the stated formula at the same time by showing that 


(AB) ( 5 -1 ^ _1 ) = (B(AB) = I 

But 

(AB) IB _1 A _1 ) = A (BB _1 Y _1 = A!A _1 = AA _1 = I 
and similarly, (b ^A 1 'j(AB) = /. 

Although we will not prove it, this result can be extended to three or more factors: 

A product of any number ofinvertible matrices is invertible, and the inverse of the product is the 
product of the inverses in the reverse order 


EXAMPLE 9 The Inverse of a Product 














Consider the matrices 


A = 


1 2 
1 3 


We leave it for you to show that 

AB = 


and also that 




3 -2 

-1 1 


7 6 
9 8 


5 -1 = 


1 -1 

I 


B = 


3 2 
2 2 


(■ AB)~' = 


4 -3 
9 7 


1 _ 

1 -1 

3 -2' 


4 -3 


-1 

CO|Csl 

i —• 

1 

_i 

-1 1_ 


9 7 

2 2 . 


Thus, =£~ 1 A~ l as guaranteed by Theorem 1.4.6. 


Powers of a Matrix 

If A is a square matrix, then we define the nonnegative integer powers of A to be 

Al* = 1 and A n = AA• • •A [n factors] 
and if A is invertible, then we define the negative integer powers of A to be 

A~ n = {a~^''i = A~^A~^ • • • A~ l [//factors] 

Because these definitions parallel those for real numbers, the usual laws of nonnegative exponents hold; for 
example, 

A r A s = A r+s and ( A r ) s = A rs 

If a product of matrices is singular, then at least 
one of the factors must be singular. Why? 

In addition, we have the following properties of negative exponents. 

THEOREM 1.4.7 

If A is invertible and n is a nonnegative integer, then: 

( a ) £ -1 is invertible and J = A. 




















ft) A n is invertible and (41”) 1 = A n = (a 1 J . 

(c) kA is invertible for any nonzero scalar k, and (kAi) =k~^A~^ 


We will prove part (c) and leave the proofs of parts (a) and ( b ) as exercises. 

Properties (c) and ( m ) in Theorem 1.4.1 imply that 

(kA) (k _1 A " 1 J = k _1 (kA)A -1 = ~ x k J AA _1 = (1 )I = I 

and similarly, ^ ^ A * J = (kA) = / Thus, kA is invertible and (£j4) = k A . 


EXAMPLE 10 Properties of Exponents 

Let A and _ t be the matrices in Example 9; that is, 


A = 


1 2 
1 3 


and A 1 = 


3 -2 

-1 1 


Then 


Also, 


41 " 3 = 


( a - 1 ?- 

3 -21 

3 —2 

C\] 

1 

CO 

1_ 


41 

- 30 " 

[A j - 

-1 1 

-1 1 

-1 1_ 


-15 

11_ 


4l 3 = 


C\] 

1 — • 

1_ 

-1 

ISJ 

"1 2 " 


-1 

UJ 

o 

1 - 

oo 

LkJ 

_1 3 _ 


15 41 _ 


so, as expected from Theorem 1.4.7(A), 
‘- 1 1 


(4 - 


(11)(41) — (30)(15) 


41 -30 
-15 11 


41 

-15 


-30 

11 


=K‘f 


EXAMPLE 11 The Square of a Matrix Sum 

In real arithmetic, where we have a commutative law for multiplication, we can write 

(a *b) 2 = a 2 *ab *ba*b 2 = a 2 *ab *ab *b 2 = a 2 * 2 ab *b 2 

However, in matrix arithmetic, where we have no co m mutative law for multiplication, the best 
we can do is to write 

(A* B) 2 = A 2 * AB * BA* B 2 

It is only in the special case where A and B commute (i.e., AB = BA) that we can go a step 
further and write 

(A*B) 2 = A 2 *2AB*B 2 



























Matrix Polynomials 

If A is a square matrix, say nxn, and if 

9 m 

p(x) =ao + aix + ct 2 X + • • • +a m x 
is any polynomial, then we define the « x n matrix p(A) to be 

p(A) = a$l ct2-A^ •¥- • • • +a m A m (3) 

where I is the « x « identity matrix; that is, p(A) is obtained by substituting A for x and replacing the constant 
term 3 q by the matrix a^I. An expression of form 3 is called a matrix polynomial in A. 

EXAMPLE 12 A Matrix Polynomial 


Find p(A) for 


p(x) —x^ — 2x — 3 and A — 


-1 2 
0 3 


Solution 

p(A) = A 2 — 2A — 31 


'-l : 

2' 

2 

-2 

'-1 

2 

-3 

'1 O' 


o : 

3 



0 

i 3 


0 

1_ 


1 4 



-2 

4 


3 0 


O 

o 

i- 

o 

■o 

1_ 



0 

6 


0 3 


o 

o 


or more briefly, p (^4) = 0. 


It follows from the fact that A r A s = A rJr5 = A 5 ^ r = A 5 A r that powers of a square matrix 
commute, and since a matrix polynomial in A is built up from powers of A, any two matrix polynomials in A 
also commute; that is, for any polynomials p\ and p 2 we have 


p l (A)p 2 (A)=p 2 (A)p l (A) 


(4) 


Properties of the Transpose 




















The following theorem lists the main properties of the transpose. 


THEOREM 1.4.8 

If the sizes of the matrices are such that the stated operations can be performed, then: 



(b) (. A + B) T = A T *B T 

(c) (A-B) T = A T -B T 

(d) (kA) T = kA T 

(e) (AB) T = B T A T 


If you keep in mind that transposing a matrix interchanges its rows and columns, then you should have little 
trouble visualizing the results in parts (< a)-(d ). For example, part {a) states the obvious fact that interchanging 
rows and columns twice leaves a matrix unchanged; and part ( b ) states that adding two matrices and then 
interchanging the rows and columns produces the same result as interchanging the rows and columns before 
adding. We will omit the formal proofs. Part ( e ) is a less obvious, but for brevity we will omit its proof as 
well. The result in that part can be extended to three or more factors and restated as: 


The transpose of a product of any number of matrices is the product of the transposes in the reverse 
order. 


The following theorem establishes a relationship between the inverse of a matrix and the inverse of its 
transpose. 


THEOREM 1.4.9 


T 

If A is an invertible matrix, then A is also invertible and 




We can establish the invertibility and obtain the formula at the same time by showing that 

But from part ( e ) of Theorem 1.4.8 and the fact that / ^ = /, we have 


=l T =I 


a t [a-') t = (a-'a) t 
(a-') t a t = [aa- 1 ) t =i t =i 


which completes the proof. 


EXAMPLE 13 Inverse of a Transpose 

Consider a general 2x2 invertible matrix and its transpose: 


A = 

'a b 

and "1 


c d 

[b d 


T 

Since A is invertible, its determinant ad — be is nonzero. But the determinant of A is also 
ad — be (verify)? so A is also invertible. It follows from Theorem 1.4.5 that 

d c 


K)"‘ ■ 


ad — be 
b 


ad — be 
a 


ad — be ad — be 

which is the same matrix that results if is transposed (verify). Thus, 

Kr‘=Kt 

as guaranteed by Theorem 1.4.9. 


Concept Review 

Commutative law for matrix addition 

Associative law for matrix addition 

Associative law for matrix multiplication 

Left and right distributive laws 

Zero matrix 

Identity matrix 

Inverse of a matrix 

Invertible matrix 

Nonsingular matrix 

Singular matrix 

Determinant 

Power of a matrix 











Matrix polynomial 

Skills 

Know the arithmetic properties of matrix operations. 

Be able to prove arithmetic properties of matrices. 

Know the properties of zero matrices. 

Know the properties of identity matrices. 

Be able to recognize when two square matrices are inverses of each other. 

Be able to determine whether a 2 x 2 matrix is invertible. 

Be able to solve a linear system of two equations in two unknowns whose coefficient matrix is 
invertible. 

Be able to prove basic properties involving invertible matrices. 

Know the properties of the matrix transpose and its relationship with invertible matrices. 


Exercise Set 1.4 

1. Let 


2 -1 3" 


CO 

1 

LO 

1 

Ul 

_i 


'0 -2 3' 

0 4 5 

, B = 

0 1 2 

, c= 

1 7 4 

-2 1 4 


4-7 6 


3 5 9 


Show that 

(a) A+(B + C) = (A + B) + C 

(b) (AB)C = A(BC) 

( C ) (a + b)C = aC + bC 
(d) a(B — C) =aB — aC 

2. Using the matrices and scalars in Exercise 1, verify that 

(a) a(BC) = (aB)C = B(aC) 

(b) A(B-C)=AB-AC 
(C ) (B + C)A = BA + CA 
(d) <bC) = (ab)C 

3. Using the matrices and scalars in Exercise 1, verify that 



(b) (A + B) T = A T + B T 

(c) ( aC) T = aC T 

(d) (AB) T = B T A T 








In Exercises 4-7 use Theorem 1.4.5 to compute the inverses of the following matrices. 



Answer: 


' I _3_" 

»-> - 5 20 

_i j. 

5 10 



8. Find the inverse of 


cos 6 sin 9 
—sin 0 cos 0 

9. Find the inverse of 



Answer: 


1 /„* . -X\ 1 /„* X\ 

2( e +e ) ~li e ~ e ) 

~2( e ~ e ) +e ) 

Use the matrix A in Exercise 4 to verify that ^4 ^ \ = ^4 j • 

Use the matrix 5 in Exercise 5 to verify that (b 2 \ = (B j . 

12. Use the matrices A and B in 4 and 5 to verify that ( AB )= 5 . 



13. Use the matrices A, B, and C in Exercises 4-6 to verify that (ABC) 1 = C ^B 1 . 
In Exercises 14-17, use the given information to find A. 


U 'A-' = 


2 -1 

3 5 


15 ‘ (1A ) -1 = 


-3 7 

1 -2 


Answer: 


A = 


1 1 
1 1 


16. 



17 ' (/ + 2A) -1 


-3 -1 

5 2 

-1 2 

4 5 


Answer: 

__2_ J_ 

13 13 

2_ _6_ 

13 13 

18 . Let A be the matrix 

2 0 

A 1 

In each part, compute the given quantity. 

(a) A 3 

(b) A~ 2 

(c) A 2 -2A + I 

(d) p(A), where p(x) = x — 2 

(e) p(A), where p(x) = 2x A — x + 1 

(f) p(A), where =x 2 — 2x + 4 

19 . Repeat Exercise 18 for the matrix 


Answer: 




20. Repeat Exercise 18 for the matrix 


21. Repeat Exercise 18 for the matrix 


Answer: 

(a) 27 0 0 

0 26 -18 
0 18 26 

<b) ^7 0 

0 0.026 0.018 

0 - 0.018 0.026 

( c ) [4 0 0 " 

0 -5 -12 
0 12 -5 

(d) [ 1 o 0" 

0-3 3 

0 -3 -3 

(e) 16 0 0 

0 -14 -15 
0 15 -14 




(f) 


25 0 0 

0 32 -24 
0 24 32 

In Exercises 22-24, let pi(x) = x 2 — 9, P 2 (x) =x + 3> and P 2 (x) =x — 3. Show that 
p\ (A) = P 2 (A)P 2 (A) for the given matrix. 


22. The matrix^ in Exercise 18. 

23. The matrix A in Exercise 21. 

24. An arbitrary square matrix A. 

25. Show that if p(x) = x 2 — (a + d)x + (ad — be) and 

'a b 


A = 


c d 


then p (A) = 0. 

26. Show that if p(x) = ar — (a + b + c)x* + (ab + ae 4= be — cd)x — a(be — cd ) and 

a 0 0 

0 be 


A = 


Ode 


then p(A) = 0. 

27. Consider the matrix 


A = 


an 0 

0 a22 
0 0 


0 

0 


a 


yin 


where ana 22 


a nn * 0. Show that A is invertible and find its inverse. 


Answer: 

0 • • • 0 

<311 

0 — • • • 0 

a 22 

0 0 . . . -i- 

a nn 

28. Show that if a square matrix A satisfies A 2 — 3A I 1=0, then A = 31 — A- 

(a) Show that a matrix with a row of zeros cannot have an inverse. 

(b) Show that a matrix with a column of zeros cannot have an inverse. 

30. Assuming that all matrices are ^ x n an d invertible, solve for D. 

ABC T DBA 7 C = AB r 




Answer: 

B~ l 

34. Simplify: 

(AC _1 j _1 (AC _1 J [AC _1 y l AD~ l 

In Exercises 35-37, determine whether ,4 is invertible, and if so, find the inverse. [Hint: Solve AX = / for X 
by equating corresponding entries on the two sides.] 

35. ri o r 

A= 110 
0 1 1 

Answer: 

1 1 _1 

2 2 2 

A~ l = -111 

A 2 2 2 

1_1 1 

2 2 2 

36 . r i i r 

A= 10 0 
0 1 1 

37 . 0 0 1 " 

A= 110 

-1 1 1 


Answer: 



1 i _r 

2 2 2 

A~ l = _! 1 1 

2 2 2 

1 0 0 

38. Prove Theorem 1.4.2. 

In Exercises 39-42, use the method of Example 8 to find the unique solution of the given linear system. 

39. 3xi -2 x 2 = - 1 

4xi + 5x2 = 3 

Answer: 

r -J_ r -11 

1_ 23’ 2_ 23 

40. -xi + 5 x 2 = 4 
-xi -3x2= 1 

41. 6 xi + *2 = 0 

4xi “ 3x2 = — 2 

Answer: 

r - _ J_ T - A. 

1 ir 2 ii 

42. 2xi -2X2=4 

*1 +4*2 = 4 

43. Prove part (a) of Theorem 1.4.1. 

44. Prove part (c) of Theorem 1.4.1. 

45. Prove part (f) of Theorem 1.4.1. 

46. Prove part (b) of Theorem 1.4.2. 

47. Prove part (c) of Theorem 1.4.2. 

48. Verify Formula 4 in the text by a direct calculation. 

49. Prove part (d) of Theorem 1.4.8. 

50. Prove part (e) of Theorem 1.4.8. 

(a) Show that if A is invertible and then B = C- 

(b) Explain why part (a) and Example 3 do not contradict one another. 

52. Show that if A is invertible and k is any nonzero scalar, then (kA) }l = for all integer values of n. 

(a) Show that if A, B , and A \ B are invertible matrices with the same size, then 

a(a~ 1 + B~ l 'jBiA + B)~ l =I 

(b) What does the result in part (a) tell you about the matrix A I B ? 



54 . A square matrix A is said to be idempotent if A 1 = A- 

(a) Show that if A is idempotent, then so is / _ A- 

(b) Show that if A is idempotent, then 2A — I is invertible and is its own inverse. 

55 . Show that if A is a square matrix such that A k = 0 f° r some positive integer k, then the matrix A is 
invertible and 

(1 -A)~ X =1 + A + A 2 + • • • +A k ~ l 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Two nxn matrices, A and B, are inverses of one another if and only if AB = BA = 0- 
Answer: 

False 

9 9 9 

(b) For all square matrices A and B of the same size, it is true that (A + B) = A + 2 AB + B . 

Answer: 

False 

9 9 

(c) For all square matrices A and B of the same size, it is true that A — B = (A — B) (A + B ). 

Answer: 

False 

(d) If A and B are invertible matrices of the same size, then AB is invertible and (AB) = A . 

Answer: 

False 

(e) If A and B are matrices such that AB is defined, then it is true that (AB) 1 = A J B 1 . 

Answer: 

False 

(I) The matrix 

A-\ a b 

[c d_ 

is invertible if and only if ad — be * 0- 

Answer: 


True 




(g) If A and B are matrices of the same size and k is a constant, then (kA + B) 1 =kA^ + B*. 


Answer: 

True 

(h) If A is an invertible matrix, then so is -T 
Answer: 

True 

(i) Ifp(x) =tfo+<3l* + tf 2 X 2 + • • • -f a m x m and / is an identity matrix, then 

p(I) =CtQ I Ct\ I Cl2 ( • • • I ct m . 

Answer: 

False 

a) a square matrix containing a row or column of zeros cannot be invertible. 
Answer: 

True 

(k) The sum of two invertible matrices of the same size must be invertible. 
Answer: 

False 
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1.5 Elementary Matrices and a Method for Finding 



In this section we will develop an algorithm for finding the inverse of a matrix, and we will discuss some of the 
basic properties of invertible matrices. 

In Section 1.1 we defined three elementary row operations on a matrix A: 

Multiply a row by a nonzero constant c. 

Interchange two rows. 

Add a constant c times one row to another. 

It should be evident that if we let B be the matrix that results from A by performing one of the operations in this 
list, then the matrix A can be recovered from B by performing the corresponding operation in the following list: 

Multiply the same row by He. 

Interchange the same two rows. 

If B resulted by adding c times row r\ of A to row r 2 , then add —c times r\ to r2- 

It follows that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A (Exercise 43). Accordingly, 
we make the following definition. 


DEFINITION 1 

Matrices A and B are said to be row equivalent if either (hence each) can be obtained from the other by 
a sequence of elementary row operations. 


J 


Our next goal is to show how matrix multiplication can be used to carry out an elementary row operation. 

r n 


DEFINITION 2 

An yi x n matrix is called an elementary matrix if it can be obtained from the ^ x n identity matrix I n 
by performing a single elementary row operation. 


J 


EXAMPLE 1 Elementary Matrices and Row Operations 


Listed below are four elementary matrices and the operations that produce them. 


1 0 
_° -3 

T 

Multiply the 
second row of 
h by - 3. 


10 0 0 
0 0 0 1 
0 0 10 
0 10 0 

t 

Interchange the 

second and fourth 
rows of 74 . 


1 0 3 
0 1 0 
0 0 1 

t 

Add 3 times the 
third row of 
1 2 to the first row. 


1 0 0 
0 1 0 
0 0 1 

T 

Multiply the 

first row of 
by 1 


The following theorem, whose proof is left as an exercises, shows that when a matrix A is multiplied on the left 
by an elementary matrix E, the effect is to perform an elementary row operation on A. 


Row Operations by Matrix Multiplication 

If the elementary matrix E results from performing a certain row operation on I m and il'.l is an x n 
matrix, then the product EA is the matrix that results when this same row operation is performed on A. 


EXAMPLE 2 Using Elementary Matrices 


Consider the matrix 



0 2 
-1 3 
4 4 


3 

6 

0 


and consider the elementary matrix 



0 0 
1 0 
0 1 


which results from adding 3 times the first row of 73 to the third row. The product EA is 


EA = 


1 

2 

4 


0 

-1 

4 


2 3 

3 6 
10 9 


which is precisely the same matrix that results when we add 3 times the first row of A to the third 
row. 


Theorem 1.5.1 will be a useful tool for 
developing new results about matrices, 
but as a practical matter it is usually 
preferable to perform row operations 
directly. 
















We know from the discussion at the beginning of this section that if E is an elementary matrix that results from 
performing an elementary row operation on an identity matrix /, then there is a second elementary row 
operation, which when applied to E , produces / back again. Table 1 lists these operations. The operations on the 
right side of the table are called the inverse operations of the corresponding operations on the left. 


Table 1 


Row Operation on I That Produces E 

Row Operation on E That Reproduces I 

Multiply row i by c 0 

Multiply row i by l/c 

Interchange rows i and j 

Interchange rows i and j 

Add c times row i to row j 

Add —c times row i to row j 


EXAMPLE 3 Row Operations and Inverse Row Operations 

In each of the following, an elementary row operation is applied to the 2 x 2 identity matrix to 
obtain an elementary matrix E , then E is restored to the identity matrix by applying the inverse row 
operation. 


'i 

o' 


'1 

O' 


'1 

O' 

_0 

1_ 

t 

_0 

7_ 

t 

_0 

1_ 



Multiply the second 



Multiply the second 





row by 7. 



row by Jj- . 



'1 

o' 


'0 

f 


'1 

O' 

_0 

1 

t 

1 

0_ 

t 

_0 

1 



Interchange the first 



Interchange the first 





and second rows. 



and second rows. 



'1 

o' 


'1 

5' 


'1 

O' 

_0 

1_ 

t 

_0 

1_ 

t 

_0 

1_ 



Add 5 times the 



Add —5 times the 





second row to the 



second row to the 





first. 



first. 




The next theorem is a key result about invertibility of elementary matrices. It will be a building block for many 
results that follow. 




























THEOREM 1.5.2 


Every elementary matrix is invertible, and the inverse is also an elementary matrix. 


If E is an elementary matrix, then E results by performing some row operation on /. Let Eq be the 
matrix that results when the inverse of this operation is performed on I. Applying Theorem 1.5.1 and using the 
fact that inverse row operations cancel the effect of each other, it follows that 

EqE = 1 and EEq = I 

Thus, the elementary matrix Eq is the inverse of E. 


Equivalence Theorem 

One of our objectives as we progress through this text is to show how seemingly diverse ideas in linear algebra 
are related. The following theorem, which relates results we have obtained about invertibility of matrices, 
homogeneous linear systems, reduced row echelon forms, and elementary matrices, is our first step in that 
direction. As we study new topics, more statements will be added to this theorem. 


Equivalent Statements 

If A is an ^ x n matrix, then the following statements are equivalent, that is, all true or all false. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is l n . 

(d) A is expressible as a product of elementary matrices. 


It may make the logic of our proof of Theorem 
1.5.3 more apparent by writing the implications 

(a) => (A) =► (0 => (d) => (a) 



This makes it evident visually that the validity 


of any one statement implies the validity of all 
the others, and hence that the falsity of any one 
implies the falsity of the others. 


We will prove the equivalence by establishing the chain of implications: 

(a) => (b) =* (c) => (d) => (fl) 

(a) -» (*) Assume A is invertible and let xq be any solution of. Multiplying both sides of this equation by the 
matrix A gives ^ _1 (^x 0 )=^ _1 0’ or L)x 0 = 0, or ZxQ = 0, or XQ = 0. Thus, Ax = 0 has only the 
trivial solution. 

Let Ax = 0 be the matrix form of the system 

*11*1 + <*12*2+ — + <21m*m = 0 

■ 321*1 + « 22*2 +... + a 2 »x n = 0 ^ 

■3^1*1 “H" <3 m2* 2 = 0 

and assume that the system has only the trivial solution. If we solve by Gauss-Jordan elimination, then the 
system of equations corresponding to the reduced row echelon form of the augmented matrix will be 

xi =0 


Thus the augmented matrix 

*11 

*21 

a n\ 

for 1 can be reduced to the augmented matrix 


for 2 by a sequence of elementary row operations. If we disregard the last column (all zeros) in each of these 
matrices, we can conclude that the reduced row echelon form of A is I n . 

(O =* (<o Assume that the reduced row echelon form of A is /„, so that A can be reduced to l n by a finite 
sequence of elementary row operations. By Theorem 1.5.1, each of these operations can be accomplished by 
multiplying on the left by an appropriate elementary matrix. Thus we can find elementary matrices 
E\. i?2> ---> Ek such that 


*2 


= 0 


*M — 0 


■312 

<322 

a n2 


a In 0 
■32m 0 

<2viyi 0 


1 0 0 
0 1 0 
0 0 1 

0 0 0 


0 0 
0 0 
0 0 

1 0 


( 2 ) 


Efc • • • E^E\A — I n 


( 3 ) 






By Theorem 1.5.2, E\, 5*2, E^ are invertible. Multiplying both sides of Equation 3 on the left successively 
by E^ { , Sf 1 we obtain 

A=E?E? • • -B?l n = B?E? • • -4' 1 (4) 

By Theorem 1.5.2, this equation expresses A as a product of elementary matrices. 

(<0 =» (a) If A is a product of elementary matrices, then from Theorem 1.4.7 and Theorem 1.5.2, the matrix^ 
is a product of invertible matrices and hence is invertible. 


A Method for Inverting Matrices 

As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that can be used to tell 
whether a given matrix is invertible, and if so, produce its inverse. To derive this algorithm, assume for the 
moment, that A is an invertible ^ x n matrix. In Equation 3, the elementary matrices execute a sequence of row 
operations that reduce A to l n . If we multiply both sides of this equation on the right by A~^ and simplify, we 
obtain 

A =Efc m • • E2E\I n 

But this equation tells us that the same sequence of row operations that reduces A to l n will transform l n to A ~^ 
. Thus, we have established the following result. 


Inversion Algorithm 

To find the inverse of an invertible matrix A , find a sequence of elementary row operations that reduces 
A to the identity and then perform that same sequence of operations on l n to obtain A • 


A simple method for carrying out this procedure is given in the following example. 


EXAMPLE 4 Using Row Operations to Find A 1 


Find the inverse of 



2 3 
5 3 
0 8 


We want to reduce A to the identity matrix by row operations and simultaneously 
apply these operations to / to produce A • To accomplish this we will adjoin the identity matrix 
to the right side of A , thereby producing a partitioned matrix of the form 

[A\ 1] 

Then we will apply row operations to this matrix until the left side is reduced to /; these 
operations will convert the right side to A , so the final matrix will have the form 




The computations are as follows: 


1 

2 

1 

'l 2 
0 1 
0 -2 

'l 2 
0 1 
0 0 

1 2 3 

0 1 -3 
0 0 1 

'12 0 
0 1 0 
0 0 1 

‘l 0 0 
0 1 0 
0 0 1 


2 

5 

0 

3 

-3 

5 

3 

-3 

-1 


3 


1 

0 

0 

3 


0 

1 

0 

8 


0 

0 

1 

> 


1 

0 

0 

j 


-2 

1 

0 



-1 

0 

1 

j 


1 

0 

0 

> 


-2 

1 

0 



-5 

2 

1 


1 

0 


0 


•2 

1 


0 


5 

-2 


-1 

-14 

6 


3 

13 

-5 


-3 


5 

-2 


-1 

-40 

16 


9 

13 

-5 


-3 


5 

-2 


-1 


We added —2 times the first 
row to the second and —1 times 
the first row to the third. 

We added 2 times the 
second row to the third. 


We multiplied the third 
row by—1. 

We added 3 times the third 
row to the second and —3 times 
the third row to the first. 

We added —2 times the 
second row to the first. 


Thus, 


iT 1 


-40 16 9 

13 -5 -3 
5 -2 -1 


Often it will not be known in advance if a given n x n matrix A is invertible. However, if it is not, then by parts 
(a) and (c) of Theorem 1.5.3 it will be impossible to reduce A to l n by elementary row operations. This will be 
signaled by a row of zeros appearing on the left side of the partition at some stage of the inversion algorithm. If 
this occurs, then you can stop the computations and conclude that A is not invertible. 

EXAMPLE 5 Showing That a Matrix Is Not Invertible 


Consider the matrix 


A = 


1 6 4 

2 4-1 

-12 5 


Applying the procedure of Example 4 yields 
























1 6 4 

2 4-1 

-12 5 

1 6 4 

0 -8 -9 
0 8 9 

1 6 4 

0 -8 -9 
0 0 0 


1 0 0 

0 1 0 

0 0 1 

1 0 0 

-2 1 0 

1 0 1 

1 0 0 

-2 1 0 

-1 1 1 


We added —2 times the first 
row to the second and added 
the first row to the third. 

We added the 
second row to 
the third. 


Since we have obtained a row of zeros on the left side, A is not invertible. 


EXAMPLE 6 Analyzing Homogeneous Systems 

Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial solutions. 

(a) xi+ 2 ^ 2 + 3^3 = 0 

2xi + 5x2 + 3x3 = 0 
xi +8x3 = 0 

(b) xi+ 6x2+ 4x3 = 0 

2xi + 4 x 2 — X3 = 0 

— xi + 2x2 + 5x3 = 0 

From parts (a) and ( b ) of Theorem 1.5.3 a homogeneous linear system has only the 
trivial solution if and only if its coefficient matrix is invertible. From Example 4 and Example 5 
the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, system (a) has 
only the trivial solution whereas system (b) has nontrivial solutions. 


Concept Review 

Row equivalent matrices 
Elementary matrix 
Inverse operations 
Inversion algorithm 

Skills 

Determi n e whether a given square matrix is an elementary. 

Determine whether two square matrices are row equivalent. 

Apply the inverse of a given elementary rwo operation to a matrix. 

Apply elementary row operations to reduce a given square matrix to the identity matrix. 











Understand the relationships between statements that are equivalent to the invertibility of a square 
matrix (Theorem 1.5.3). 

Use the inversion algorithm to find the inverse of an invertible matrix. 

Express an invertible matrix as a product of elementary matrices. 


Exercise Set 1.5 


1. Decide whether each matrix below is an elementary matrix. 


(a) 


1 0 
-5 1 


(b) 


-5 1 
1 0 


(c) 


1 1 0 
0 0 1 
0 0 0 


(d) 


2 0 
0 1 
0 0 
0 0 


0 2 
0 0 
1 0 
0 1 


Answer: 


(a) Elementary 

(b) Not elementary 

(c) Not elementary 

(d) Not elementary 


2. Decide whether each matrix below is an elementary matrix. 


(a) 


1 0 
0 {3 


(b) 


0 0 1 
0 1 0 
1 0 0 


(c) 


1 0 0 
0 1 9 
0 0 1 


(d) 


-10 0 
0 0 1 

0 1 0 


3. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 


















the identity matrix. 

(a) I" 1 -3 

_0 1 _ 

(b) -7 0 O' 

0 1 0 
0 0 1 

(c) f 10 0" 

0 1 0 
-5 0 1 

(d) [0 1 0" 

0 10 0 
10 0 0 
0 0 0 1 

Answer: 


(a) 

(b) 

(c) 

(d) 


Add 3 times row 2 to row 1: 

-J 0 0 

Multiply row 1 by-y: ^ ^ ^ 

0 0 1 
'1 0 O' 

Add 5 times row 1 to row 3:010 

_5 0 1 

'0010 

0 10 0 

Swap rows 1 and 3: „ . . . 

p 10 0 0 

0 0 0 1 



4. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 
the identity matrix. 

(a) I" 1 o' 

-3 1_ 

(b) [1 0 0" 

0 1 0 

0 0 3 

(c) r o o o r 

0 10 0 
0 0 10 

10 0 0 



(d) 


1 

0 

1 

0 


7 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 


5. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding 
to E and show that the product EA results from applying the row operation to A. 


(a) 


E = 


(b) 


E = 


1 
0 

0 

1 

-3 



"-1 

-2 5 -1' 



3 

1 

o 

1 

^ o 

1 




'2-1 0 - 

-4 -4' 

y 

A = 

1 -3 -1 

5 3 



2 0 1 

3 -1 


(c) 

1 0 4' 


'1 4' 

E = 

0 1 0 

, A = 

2 5 


0 0 1 


3 6 


Answer: 


(a) 


Swap rows 1 and 2: EA = 


(b) 


Add _3 times row 2 to row 3: EA = 


(c) 


Add 4 times row 3 to row 1: EA = 


3 _6 -6 -6 

-1 -2 5 -1_ 

2-1 0 
1 -3 -1 
-19 4 

13 28' 

2 5 

3 6 


-4 -4 

5 3 

-12 -10 


6. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding 
to E and show that the product EA results from applying the row operation to A. 


(a) 


E = 


-6 

0 


A = 


-1 -2 
3 -6 


5 -1 
-6 -6 


(b) 

1 0 

o' 



'2 

1 

1 

o 

7 

E = 

-4 1 

0 

, A = 

= 

1 

-3-1 5 3 


O 

o 

_1 

1 



2 

0 13-1 

(c) 

1 0 o' 



"1 

4' 


E = 

0 5 0 

y 

A = 

2 

: 5 



i 

o 

o 

_1 



3 

: 6 



In Exercises 7-8, use the following matrices. 



A = 


3 

2 

8 


C = 


3 

2 

2 


F = 


8 

8 

3 


4 

r 


'8 1 5 

-7 

-1 

, B = 

2 -7 -1 

1 

5 


3 4 1 

4 

f 


00 

-7 

-1 

, D = 

-6 21 3 

-7 

3 


^r 

oo 

_i 

1 5' 




1 1 




4 1 





7. Find an elementary matrix E that satisfies the equation. 

(a) EA = B 

(b) EB = A 

(c) EA = C 

(d) ec = a 

Answer: 


(a) 

0 

0 

1 


0 

1 

0 


1 

0 

0 

(b) 

"0 

0 

r 


0 

1 

0 


1 

0 

0 

(c) 


1 

0 0 



0 

1 0 


— 

2 

0 1 

(d) 

'1 

0 

0 ' 


0 

1 

0 


2 

0 

1 


8. Find an elementary matrix E that satisfies the equation. 

(a) EB = D 

(b) ED —B 
( C ) EB = F 
(d) EF=B 

In Exercises 9-24, use the inversion algorithm to find the inverse of the given matrix, if the inverse exists. 


9 . 


1 4 

2 7 


Answer: 



10 . 


11 . 



4 

-1 


6 

5 


3 

-2 


Answer: 



l 

o|isj 



3 1 

7 7 


12 . 

6 - 

-4' 


-3 

2_ 

13 . 

'3 4 

-1 


1 0 

3 


2 5 

-4 


Answer: 


1 _n_ _6 

2 10 5 

-1 1 1 

_1 J_ 2 
"2 10 5 


14 . 


1 2 0 
2 1 2 
0 2 1 


2 4 1 

-4 2 -9 


Answer: 


No inverse 


16 . 


1 

5 

1 

5 

1 

5 


1 _2 

5 5 

1 J_ 
5 10 

4 2_ 

5 10 


17 . 


1 0 1 
0 1 1 
1 1 0 


Answer: 



1 1 I 

2 2 2 

_1 1 1 

2 2 2 

1 I _i 

2 2 2 

18 - /2 3/2 0 

-4/2 /2 0 

0 0 1 

19 . [2 6 6" 

2 7 6 
2 7 7 

Answer: 


1 ° - 3 " 

-1 1 0 

0 -1 1 

20. [1 0 0 0" 

13 0 0 
13 5 0 
13 5 7 

21 . [2 -4 0 O' 

1 2 12 0 

0 0 2 0 

0 -1 -4 -5 

Answer: 

1 1 -3 

4 2 

_1 I _3 

8 4 2 

0 0 i 

40 20 10 

-8 17 2 ^ 

4 0 | -9 

0 0 0 0 

-1 13 4 2 





23. [-1 0 1 O' 

2 3-2 6 
0-1 2 0 
0 0 15 

Answer: 

~ _ 1 _ JL 5 _l" 

12 24 8 4 

5 _5_ 1 _I 

6 12 4 2 

_5_ _5_ 5 _I 

12 24 8 4 

_L _J_ _i I 

12 24 8 4 

24. [0 0 2 O' 

10 0 1 
0-13 0 

2 15-3 

In Exercises 25 26. 11 nd the inverse of each of the follow ing 4x4 matrices, where k\, kj, * 3 , * 4 . and k are 
all nonzero. 

25 -(a) [*1 0 0 O' 

0 * 2 0 0 
0 0 *3 0 

0 0 0 £4 

(b) \k 1 O' 

0 10 0 
0 0*1 
0 0 0 1 

Answer: 

( a ) [j_ 0 0 0 

*1 

0 -J- 0 0 

h 

0 0 -ji- 0 

*3 

0 0 0 i 

fc 4 





26 -(a) 0 0 0 * 1 " 

0 0 * 2 0 

0 * 3 0 0 

£4 0 0 0 

(b) I”jt 0 0 ' 

1 * 0 0 

0 1 £ 0 

0 0 1 k 

In Exercise 27-Exercise 28, find all values of c, if any, for which the given matrix is invertible. 

27. \ c c c ~ 

1 c c 


Answer: 

c* 0,1 

28. I "c 1 0" 

1 c 1 
0 1 c 

In Exercises 29-32, write the given matrix as a product of elementary matrices. 



30. I" 1 0" 

-5 2_ 

31. f 1 0 -2' 

0 4 3 
0 0 1 


Answer: 



32. 


1 

0 

-2' 


'1 

0 

-2' 

'1 

0 

o' 

'1 

0 

O' 

0 

4 

3 

= 

0 

1 

0 

0 

1 

3 

0 

4 

0 

0 

0 

1 


0 

0 

1 

0 

0 

1 

0 

0 

1 


1 1 0 
1 1 1 
0 1 1 

In Exercises 33-36, write the inverse of the given matrix as a product of elementary matrices. 

33. The matrix in Exercise 29. 

Answer: 



1 O' 

“7 0 

‘l 

-\ 

'1 o' 


-1 1_ 

4 

0 1 

_0 

1_ 

0 1 


34. The matrix in Exercise 30. 

35. The matrix in Exercise 31. 


Answer: 


'l 0 2' 

o 1 -1 

4 4 


'l 0 o' 
0 1 0 

o 

o 

'1 0 2' 

= 

0 1 -3 

0 0 1 

0 1 0 

0 0 1 

O 

O 

i_ 


o 

o 

1_ 




36. The matrix in Exercise 32. 

In Exercises 37-38, show that the given matrices A and B are row equivalent, and find a sequence of 
elementary row operations that produces B from A. 


37. 

'1 2 3' 


l 

O 

A = 

1 4 1 

, B = 

0 2-2 


1 

o\ 

Csl 

_l 


1 1 4 


Answer: 


Add — 1 times the first row to the second row. Add — ] times the first row to the third row. Add _ ] times 
the second row to the first row. Add the second row to the third row. 


38. 

2 

1 

O' 


6 

9 

4' 

A = 

-1 

1 

0 

, B = 

-5 

-1 

0 


3 

0 

-1 


-1 

-2 

-1 


39. Show that if 


A = 


1 0 
0 1 
a b 


0 

0 

c 


is an elementary matrix, then at least one entry in the third row must be a zero. 



40. Show that 


A = 


0 

b 

0 

0 

0 


a 

0 

d 

0 

0 


0 0 0 
c 0 0 
0 e 0 

/ 0 g 

0^0 


is not invertible for any values of the entries. 


41. Prove that if A and B are ^ x n matrices, then A and B are row equivalent if and only if A and B have the 
same reduced row echelon form. 


42. Prove that if A is an invertible matrix and B is row equivalent to A, then B is also invertible. 

43. Show that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A. 


True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The product of two elementary matrices of the same size must be an elementary matrix. 

Answer: 

False 

(b) Every elementary matrix is invertible. 

Answer: 

True 

(c) If A and B are row equivalent, and if B and C are row equivalent, then A and C are row equivalent. 

Answer: 

True 

(d) If A is an n x n matrix that is not invertible, then the linear system Ax = 0 has infinitely many solutions. 
Answer: 

True 

(e) If A is an n x n matrix that is not invertible, then the matrix obtained by interchanging two rows of A cannot 
be invertible. 

Answer: 


True 




(f) If A is invertible and a multiple of the first row of A is added to the second row, then the resulting matrix 
invertible. 

Answer: 

True 

(g) An expression of the invertible matrix A as a product of elementary matrices is unique. 

Answer: 

False 
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1.6 More on Linear Systems and Invertible Matrics 

In this section we will show how the inverse of a matrix can be used to solve a linear system and we will develop some more results about 
invertible matrices. 


Number of Solutions of a Linear System 

In Section 1.1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear system has either no solutions, has exactly one solution, 
or has infinitely many solutions. We are now in a position to prove this fundamental result. 


THEOREM 1.6.1 

A system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities. 


If Ax = b is a system of linear equations, exactly one of the following is true: (a) the system has no solutions, (b) the system has exactly 
one solution, or (c) the system has more than one solution. The proof will be complete if we can show that the system has infinitely many solutions 
in case (c). 


Assume that = b has more than one solution, and let xq = x\ — X2, where xi and X2 are any two distinct solutions. Because xi and X2 are 
distinct, the matrix xo is nonzero; moreover, 

Axq = A(x\ — X2) = Ax. i — Ax 2 = b — b = 0 


If we now let k be any scalar, then 


A(x\ + &xo) = ^*1 + -d(Axo) = Ax i 4- £(^4xo) 
= b + £0 = b+ 0 = b 


But this says that x\ + &xq is a solution of Ax = b- Since xo is nonzero and there are infinitely many choices for k , the system Ax = b has 
infinitely many solutions. 


Solving Linear Systems by Matrix Inversion 

Thus far we have studied two procedures for solving linear systems-Gauss-Jordan elimination and Gaussian elimination. The following theorem 
provides an actual formula for the solution of a linear system of n equations in n unknowns in the case where the coefficient matrix is invertible. 


THEOREM 1.6.2 

If A is an invertible ^ x n matrix, then for each « x 1 matrix b, the system of equations Ax = b has exactly one solution, namely, x = A -1 b 


Since A ^4 *b J = b ? it follows that x = A *b is a solution of Ax = b- To show that this is the only solution, we will assume that xo is an 
arbitrary solution and then show that xo must be the solution ^ _ 1 b- 

If xo is any solution of ^x = b> th en -< 4 x 0 = b. Multiplying both sides of this equation by A , we obtain X q = ^4 _ 1 b- 

EXAMPLE 1 Solution of a Linear System Using A -1 

Consider the system of linear equations 

x\ + 2 x 2 + 3 x 3 = 5 
2xi+ 5x2+ 3x3= 3 
xi + 8 x 3 = 17 

In matrix form this system can be written as ^x = b> where 


"1 2 3" 


"*f 


" 5" 

2 5 3 

, X = 

*2 

, b = 

3 

i— 

o 

00 

1_ 


*3 


17 


In Example 4 of the preceding section, we showed that A is invertible and 

-40 16 9 


a-*- 


By Theorem 1.6.2, the solution of the system is 


x = A *b = 


13 “5 -3 
5 -2 =1 


'-40 16 9" 

" 5" 


f 

co 

1 

m 

1 

CO 

3 

= 

-1 

5 —2 -1 

17 


2 


or xi = 1, *2 — — 1, *3 = 2. 


Keep in mind that the method of Example 1 only applies when the 
system has as many equations as unknowns and the coefficient 
matrix is invertible. 


Linear Systems with a Common Coefficient Matrix 

Frequently, one is concerned with solving a sequence of systems 

Ax. = h\, Ax = b2, Ax = h2,..., Jx = b* 
each of which has the same square coefficient matrix + If A is invertible, then the solutions 

x\=A~ l h\, X2 = A~ l h2, X3=^ _ 1 b3,x*=j 4 -1 bfc 

can be obtained with one matrix inversion and k matrix multiplications. An efficient way to do this is to form the partitioned matrix 

M|bi|b 2 |- • -|b*|] 

in which the coefficient matrix A is “augmented” by all k of the matrices bi, b 2 ,.. .,b& and then reduce 1 to reduced row echelon form by Gauss- 
Jordan elimination. In this way we can solve all k systems at once. This method has the added advantage that it applies even when A is not 
invertible. 

EXAMPLE 2 Solving Two Linear Systems at Once 


Solve the systems 



(a) 

X 1 + 

2 x 2 + 

3x3 = 4 



2 xi + 

5x 2 + 

3x3 = 5 



*1 

+ 

8 x 3 = 9 


(b) 

xi + 

2 x 2 + 

3x 3 = 

1 


2 xi + 

5x2 + 

3x 3 = 

6 


*1 

+ 

8 x 3 = - 

•6 


The two systems have the same coefficient matrix. If we augment this coefficient matrix with the columns of constants on 
the right sides of these systems, we obtain 


”12 3 

4 

f 

2 5 3 

5 

6 

1 — 

0 

00 

9 

-6 


Reducing this matrix to reduced row echelon form yields (verify) 


0 

0 

1 

2" 

0 1 0 

0 

1 

1— 

0 

0 

1 

-1 


It follows from the last two columns that the solution of system (a) is xi = 1,X2 = 0,X3 = 1 and the solution of system (b) is x\ = 2 
, *2 = 1 , *3 = “ 1 - 


( 1 ) 
























Properties of Invertible Matrices 

Up to now, to show that an ^ x n matrix A is invertible, it has been necessary to find an nxn matrix B such that 

AB = 1 and BA = ! 

The next theorem shows that if we produce an ^ x n matrix B satisfying either condition, then the other condition holds automatically. 


THEOREM 1.6.3 

Let A be a square matrix. 

(a) If B is a square matrix satisfying BA = L then B = A -1 • 

(b) If B is a square matrix satisfying AB — /, then B = A • 


We will prove part ( a ) and leave part ( b ) as an exercise. 

Assume that BA = /• If we can show that A is invertible, the proof can be completed by multiplying BA = / on both sides by A -1 to 

obtain 


BAA~ l =IA~ l or BI = IA~ l or B = A~ l 

To show that A is invertible, it suffices to show that the system = 0 has only the trivial solution (see Theorem 1.5.3). Let xo be any solution of 
this system. If we multiply both sides of Axq = 0 on the left by B , we obtain BAx q = BO or /xq = 0 or xq = 0. Thus, the system of equations 
Ax = 0 has only the trivial solution. 


Equivalence Theorem 

We are now in a position to add two more statements to the four given in Theorem 1.5.3. 


Equivalent Statements 

If A is an n x n matrix, then the following are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is I n . 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every « x 1 matrix b. 

(f) Ax = b has exactly one solution for every ^ x 1 matrix b. 


It follows from the equivalency of parts (e) and if) that if you can 
show that Ax = b has at least one solution for every « x 1 matrix 
b, then you can conclude that it has exactly one solution for every 
x 1 matrix b. 

Since we proved in Theorem 1.5.3 that (a), ( b ), (c), and ( d) are equivalent, it will be sufficient to prove that (a) =* (/) =£* (e) => (a). 

This was already proved in Theorem 1.6.2. 

(/)=#> ( e ) This is self-evident, for if ^x = b has exactly one solution for every n x 1 matrix b, then ^x = b is consistent for every « x 1 matrix b. 


« =* to If the system Ax = b is consistent for every « x 1 matrix b, then, in particular, this is so for the systems 


Y 


"0" 


" 0 " 

0 


1 


0 

0 

, Ax = 

0 

Ax = 

0 

0 


0 


1 


Let xi, X 2 ,.. .,x w be solutions of the respective systems, and let us form an nxn matrix C having these solutions as columns. Thus C has the form 

C= [xi|x 2 |- • • |x„] 

As discussed in Section 1.3, the successive columns of the product AC will be 

Ax\, Ax 2 , Ax n 


[see Formula 8 of Section 1.3]. Thus, 

AC= [Axi\Ax 2 \ ■ ■ • |^x„] = 

By part ( b ) of Theorem 1.6.3, it follows that C = A~^- Thus, A is invertible. 


1 0 
0 1 
0 0 

0 0 


= / 


We know from earlier work that invertible matrix factors produce an invertible product. Conversely, the following theorem It shows that if the 
product of square matrices is invertible, then the factors themselves must be invertible. 


THEOREM 1.6.5 

Let A and B be square matrices of the same size. If AB is invertible, then A and B must also be invertible. 


In our later work the following fundamental problem will occur frequently in various contexts. 


A Fundamental Problem 

Let A be a fixed ^ x n matrix. Find all ^ x 1 matrices b such that the system of equations Ax. = b is consistent. 

J 


If A is an invertible matrix, Theorem 1.6.2 completely solves this problem by asserting that for every ^ x 1 matrix b, the linear system Ax = b has 
the unique solution x = A~^h- If ^4 is not square, or if ^4 is square but not invertible, then Theorem 1.6.2 does not apply. In these cases the matrix b 
must usually satisfy certain conditions in order for ^x = b t° be consistent. The following example illustrates how the methods of Section 1.2 can 
be used to determine such conditions. 

EXAMPLE 3 Determining Consistency by Elimination 

What conditions must b\,b 2 , and b 3 satisfy in order for the system of equations 

*1+3:2 + 27:3 = ^1 

*1 +*3 = ^2 

2 t:i+3:2 + 33:3 = Z>3 

to be consistent? 

Solution The augmented matrix is 

'l 1 2 

1 0 1 b 2 

2 13 b 3 


which can be reduced to row echelon form as follows: 












— 1 times the first row was added to the second and — 2 times the first row was added to the third. 


1 1 2 b\ 

0 —1 —1 b 2 ~b\ 
0 -1 -1 b 2 — 2b\ 

1 1 2 *i 

0 1 1 

0 -l _1 b 2 -2b\ 

112 b\ 

0 1 1 b\-b 2 

0 0 0 &3 — b 2 — b\ 


The second row was multiplied by—1. 


The second row was added to the third. 


It is now evident from the third row in the matrix that the system has a solution if and only iib\, b 2 , and b 2 satisfy the condition 

& 3_.&2 — &i=0 or &3 = &i+&2 

To express this condition another way, = b is consistent if and only if b is a matrix of the form 

b\ 


b = 


h 

b\ +&2 


where b\ and b 2 are arbitrary. 


EXAMPLE 4 Determining Consistency by Elimination 

What conditions must b\,b 2 , and b 2 satisfy in order for the system of equations 

xi + 2x2 + 3x3 = ^1 
2xi + 5x2 + 3x3 = ^2 
xi H-8 x3 = &3 

to be consistent? 

Solution The augmented matrix is 

'12 3 b x ~ 

2 5 3 b 2 
1 0 8 b 3 

Reducing this to reduced row echelon form yields (verify) 

'1 0 0 -40£i + 16£ 2 + 9Z> 3 ' 

0 1 0 13£i-562-3£ 3 

0 0 1 5b\ — 2*2“ h 

In this case there are no restrictions on b\, b 2 , and b 2 , so the system has the unique solution 

xi = — 40&i + 16Z>2 4- 9&3, X 2 = 13^i — 5Z>2 ” 3Z>3, X 3 = 5b\ — 2b 2 — b 2 
for all values of b 1 , b 2 , and 63 . 


( 2 ) 


(3) 


What does the result in Example 4 tell you about the coefficient 
matrix of the system? 


Skills 

Determine whether a linear system of equations has no solutions, exactly one solution, or infinitely many solutions. 
Solve linear systems by inverting its coefficient matrix. 

Solve multiple linear systems with the same coefficient matrix simultaneously. 














Be familiar with the additional conditions of invertibility stated in the Equivalence Theorem. 


Exercise Set 1.6 


In Exercises 1-8, solve the system by inverting the coefficient matrix and using Theorem 1.6.2. 

1 . *1 + *2 = 2 

5x\ + 6 x 2 = 9 

Answer: 

xi = 3, * 2 = - 1 

2.4xi -3x2= “3 
2 xi “ 5x2 = 9 

3. xi + 3x2 + X3 = 4 

2xi + 2x2 + X3 = —1 

2xi + 3x2 + X 3 = 3 

Answer: 

xi = - 1, X2=4, X3= -1 

4 . 5xi + 3x2 + 2 x 3 = 4 

3xi + 3x2 + 2x3 = 2 

*2 4“ *3 = 5 

5 . *+7 + z = 5 

x +7 -4z = 10 

— 4x + y + z = 0 

Answer: 

x = 1, x = 5, x = — I 

6 . - x - 2y - 3z = 0 
w+ x + 4y + 4z = 7 
w + 3x + ly + 9z = 4 

—w — 2 x — 4j — 6 z = 6 

7. 3xi + 5x2 = *1 

xi + 2 x 2 = ^2 
Answer: 

xi =2b\ — 5Z>2, X 2 = -*i + 3Z?2 

8 . xi + 2 x 2 + 3x3 = b\ 

2 xi+ 5x2+5x3 = *2 

3xi+ 5x2+ 2x3 = *3 

In Exercises 9-12, solve the linear systems together by reducing the appropriate augmented matrix. 

9. xi — 5x 2 = &i 
3xi + 2 x 2 = *2 

(i) *1 = 1 . *2 = 4 

(ii) *i = -2, b 2 = 5 


Answer: 


(i) 

(ii) 


22 1 

Xl= rf’ X2= rf 


21 


11 


*i = ir X2 = W 


10. -XI +4^2+ *3 = *1 
xi + 9x2 -2x3 = *2 

6xi+ 4 x 2 — 8x3 = *3 

(i) *1 = 0, b 2 = 1, i>3 = 0 

(ii) b\ = - 3, *2 = 4. *3 = -5 

11.4xi -7 x2 = *i 

xi + 2 x 2 = *2 

(i) *i = 0, *2 = 1 

(ii) b\ = -4, *2 = 6 

(iii) *1 = “ 1. *2 = 3 

(iv) b\ = -5, *2 = 1 


Answer: 


(i) 

(ii) 

(iii) 

(iv) 


Xl= i- * 2 = i5 

34 28 

* 1 = - , 2 = - 


19 


13 


* 1 = TT’ * 2 = T5 


*1 = - y *2 = 3- 


12 . *i + 3*2 + 5*3 = 

— *1 — 2x2 = *2 

2x\ + 5x2 + 4^3 = &3 

(i) *1 = 1- *2 = 0, *3 = “ 1 

(ii) *1 = 0, *2=1, *3=1 

(iii) *1 = - 1, *2 = - 1, *3 = 0 


In Exercises 13-17, determine conditions on the b{ s, if any, in order to guarantee that the linear system is consistent. 

13. Xi+3X2 = *1 
-2xi + *2 = *2 


Answer: 


No conditions on b\ and & 2 

14. 6x1 -Ax2 = b\ 

2x\ -2x 2 =*2 

15. xi-2x2 4-5x3 = *1 

4x1-5x2 4-8x3 = b 2 

—3xi + 3x2 — 3x3 = *3 

Answer: 

*3 = *1 — *2 

16. x\ -2x2- *3 = *1 

—4xi + 5x2 + 2 x 3 = *2 
—4xi + 7x2+ 4x3 = *3 

17. xi - X2 + 3x3 + 2x4 = *1 

—2xi + X2 + 5x3 + X4 = *2 

—3xi + 2x2 + 2x3 — X4 = *3 

4xi — 3x2 + X 3 + 3x4 = *4 

Answer: 

*1 = *3 + *4, *2 = 2*3+*4 


18. Consider the matrices 



'2 1 2" 


'*l" 

2 2-2 

and x = 

*2 

3 1 1 


*3 


(a) Show that the equation Ax = x can be rewritten as (A — l)x = 0 and use this result to solve Ax = x for x - 

(b) Solve Ax = 4x- 

In Exercises 19-20, solve the given matrix equation forX 


19. 

'i -i r 


"2 -1 5 7 8" 


2 3 0 

0 2-1 

X = 

4 0-301 

3 5 -7 2 1 


Answer: 



11 12 


3 

27 

26 

X = 

-6 - 

8 

1 

-18 

-17 


-15 -21 

( 


-38 

-35 

'-2 

0 r 



"4 

3 2 

f 

0 

-1 -1 

X 

= 

6 

00 

9 

1 

1 -4 



1 

3 7 

9 


21. Let Ax = 0 be a homogeneous system of n linear equations in n unknowns that has only the trivial solution. Show that if k is any positive 
integer, then the system A^x — 0 also has only the trivial solution. 

22. Let Ax = 0 be a homogeneous system of n linear equations in n unknowns, and let Q be an invertible nxn matrix. Show that ^dx = 0 has just 
the trivial solution if and only if (g^4)x = 0 has just the trivial solution. 

23. Let Ax = b be any consistent system of linear equations, and let xi be a fixed solution. Show that every solution to the system can be written in 
the form x = xi + xq, where xo is a solution to Ax = 0- Show also that every matrix of this form is a solution. 

24. Use part (a) of Theorem 1.6.3 to prove part (b). 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) It is impossible for a linear system of linear equations to have exactly two solutions. 

Answer: 

True 

(b) If the linear system Ax = b has a unique solution, then the linear system Ax = c also must have a unique solution. 

Answer: 

True 

(c) If A and B are « x n matrices such that AB = l n , then BA = 

Answer: 

True 

(d) If A and B are row equivalent matrices, then the linear systems Ax = 0 and fix = 0 have the same solution set. 

Answer: 

True 

(e) If A is an ^ x n matrix and S is an ^ x n invertible matrix, then if x is a solution to the linear system (£ -1 AS)x = b, then fix is a solution to the 
linear system Ay = fib. 

Answer: 


True 
















(f) Let A be an « x « matrix. The linear system = 4 X has a unique solution if and only if A — 4 / is an invertible matrix. 


Answer: 

True 

(g) Let A and B be ^ x n matrices. If A or B (or both) are not invertible, then neither is AB. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.7 Diagonal, Triangular, and Symmetric Matrices 

In this section we will discuss matrices that have various special forms. These matrices arise in a wide variety of applications 
and will also play an important role in our subsequent work. 


Diagonal Matrices 


A square matrix in which all the entries off the main diagonal are zero is called a diagonal matrix. Here are some examples: 

'6 0 0 O' 


0 0 
0 0 


2 0 
0 -5 


1 0 0 
0 1 0 
0 0 1 


0-400 
0 0 0 0 
0 0 0 8 


A general nxn diagonal matrix D can be written as 


D = 


d i 0 

0 d 2 
0 0 


0 

0 

d n 


( 1 ) 


A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in this case the inverse of 1 is 

1/^1 0 ... 0 

0 \fd 2 ... o 


D~ l = 


0 


0 


1 !d„ 


( 2 ) 


Confirm Formula 2 by showing that 

DD~ X =D~ X D = 1 


Powers of diagonal matrices are easy to compute; we leave it for you to verify that if D is the diagonal matrix 1 and k is a 
positive integer, then 


D k 


d\ 0 

0 d\ 


0 

0 


0 



( 3 ) 


EXAMPLE 1 Inverses and Powers of Diagonal Matrices 


if 



0 0 
-3 0 
0 2 


then 


















'l 

0 

O' 




'l 

0 

0 ' 

0 

1 

0 


1 

0 

0 


0 

1 

0 

3 

. a 5 = 

0 

-243 

0 

, a ~ 5 = 

243 

0 

0 

i 


0 

0 

32 


0 

0 

_L 


2 






32 


Matrix products that involve diagonal factors are especially easy to compute. For example, 


A 

0 

0 " 

-<x n 

a l2 

<313 

<314' 


^1<3H 

d\a\2 

d\a\i 

1- 

5] 

0 

<*2 

0 

a 2l 

a 22 

a 23 

a 24 

= 

^2«21 

d2<*22 

d 2£?23 

to 

& 

to 

0 

0 

d 2 

<331 

a 22 

<333 

<334 


y 3 <3 3 i 

a?3<332 

<2? 33 

d?3<234 


"<2n 

021 

(2 3 1 

«12 

«22 

032 

aW 

<*22 

<333 

^ ° 

1_ 

0 

d2 

1- 

O O 

(2 4 1 

<*42 

343 

0 

0 

^ 3 _ 


d\a\\ d^ayi <^3^13 
d 1^21 ^ 2^22 ^3^23 
d 1^31 <^2^32 ^3^33 
d\ctt\\ d 2^42 ^ 3^43 


In words, to multiply a matrix A on the left by a diagonal matrix D, one can multiply successive rows of A by the 
successive diagonal entries of D, and to multiply A on the right by D, one can multiply successive columns of A by the 
successive diagonal entries of D. 


Triangular Matrices 

A square matrix in which all the entries above the main diagonal are zero is called lower triangular , and a square matrix in 
which all the entries below the main diagonal are zero is called upper triangular. A matrix that is either upper triangular or 
lower triangular is called triangular. 

EXAMPLE 2 Upper and Lower Triangular Matrices 


on 

0 12 

013 

014 


011 

0 

0 

0 

0 

022 

023 

024 


021 

022 

0 

0 

0 

0 

a 3 3 

034 


031 

032 

033 

0 

_0 

0 

0 

044. 


_041 

042 

O43 

044_ 


Observe that diagonal matrices are both upper triangular and lower triangular since they have zeros below and 
above the main diagonal. Observe also that a square matrix in row echelon form is upper triangular since it has zeros below 
the main diagonal. 


Properties of Triangular Matrices 

























Example 2 illustrates the following four facts about triangular matrices that we will state without formal proof. 

triangular if and only if all entries to the left of the main diagonal are zero; that is, 

triangular if and only if all entries to the right of the main diagonal are zero; that is, 

triangular if and only if the z'th row starts with at least j _ 1 zeros for every i. 
triangular if and only if the jth column starts with at least j — 1 zeros for every j. 

><j 
*>j 

Figure 1 . 7.1 

The following theorem lists some of the basic properties of triangular matrices. 


A square matrix A = [fly] is upper 
fly = 0 if i > j (Figure 1.7.1). 

A square matrix A = [a y ] is lower 
fly = 0 if j < j (Figure 1.7.1). 

A square matrix A = [fly] is upper 
A square matrix A = [<3y ] is lower 


THEOREM 1.7.1 

(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is 
lower triangular. 

(b) The product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper 
triangular. 

(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 

(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper 
triangular matrix is upper triangular. 


Part (a) is evident from the fact that transposing a square matrix can be accomplished by reflecting the entries about the main 
diagonal; we omit the formal proof. We will prove (b), but we will defer the proofs of (c) and (d) to the next chapter, where 
we will have the tools to prove those results more efficiently. 

Proof (b) We will prove the result for lower triangular matrices; the proof for upper triangular matrices is similar. Fet 
A = [ajj ] and B = [6y ] be lower triangular n x n matrices, and let C = [cy ] be the product C = AB • We can prove that C 
is lower triangular by showing that c y = 0 for i < j. But from the definition of matrix multiplication, 

c y = + a i2^2j + ’ ' " + 

If we assume that i <j, then the terms in this expression can be grouped as follows: 

c y = + ’ " * "h ^ 

Terms in which the row Terms in which the row 

number of b is less than the number of a is less than 

column number of b the column number of a 

In the first grouping all of the b factors are zero since B is lower triangular, and in the second grouping all of the a factors are 
zero since A is lower triangular. Thus, c y = 0 , which is what we wanted to prove. 


EXAMPLE 3 Computations with Triangular Matrices 






Consider the upper triangular matrices 


'1 

3 

-f 


'3 

-2 

2' 

0 

2 

4 

, B = 

0 

0 

-1 

0 

0 

5 


0 

0 

1 


It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix B is not. Moreover, the 
theorem also tells us that A ? AB, and BA must be upper triangular. We leave it for you to confirm these three 


statements by showing that 


1 


A~ l 


0 


0 


3 

2 

1 

2 

0 


7 

5 

2 

5 

1 

5 


"3 -2 —2 


i- 

7 

m 

on 

0 0 2 

II 

0 0-5 

0 0 5 


W-l 

o 

o 
_1 


Symmetric Matrices 

r 


DEFINITION 1 

A square matrix A is said to be symmetric if A = A 


J 


It is easy to recognize a symmetric matrix by 
inspection: The entries on the main diagonal have no 
restrictions, but mirror images of entries across the 
main diagonal must be equal. Here is a picture using 
the second matrix in Example 4: 



All diagonal matrices, such as the third matrix in 
Example 4, obviously have this property. 


EXAMPLE 4 Symmetric Matrices 


The following matrices are symmetric, since each is equal to its own transpose (verify). 


7 

-3 



1 4 5 

4-3 0 
5 0 7 


d\ 0 0 0 

0 d 2 0 0 

0 0 d 3 0 

0 0 0 d 4 




















It follows from Formula 11 of Section 1.3 that a square matrix A = [tfjy] is symmetric if and only if 




( 4 ) 


for all values of i and j. 

The following theorem lists the main algebraic properties of symmetric matrices. The proofs are direct consequences of 
Theorem 1.4.8 and are omitted. 


THEOREM 1.7.2 

If A and B are symmetric matrices with the same size, and if k is any scalar, then: 

( a ) A T is symmetric. 

(b) A + B and A — B are symmetric. 

(c) kA is symmetric. 


It is not true, in general, that the product of symmetric matrices is symmetric. To see why this is so, let A and B be symmetric 
matrices with the same size. Then it follows from part ( e ) of Theorem 1.4.8 and the symmetry of A and B that 

(AB) T = b t a t = ba 

T 

Thus, (AB) = AB if and only if AB = BA, that is, if and only if A and B commute. In summary, we have the following 
result. 


THEOREM 1.7.3 

The product of two symmetric matrices is symmetric if and only if the matrices commute. 


EXAMPLE 5 Products of Symmetric Matrices 

The first of the following equations shows a product of symmetric matrices that is not symmetric, and the 
second shows a product of symmetric matrices that is symmetric. We conclude that the factors in the first 
equation do not commute, but those in the second equation do. We leave it for you to verify that this is so. 


'1 2" 

-1 

1 


1- 

Csl 

I 

2 3_ 

1- 

o 


-5 2 


OO 

1_ 

l 

CO 

^r 

I 

i_ 


1- 

CS1 

2 3_ 

i 

7 

CO 

_1 


CO 


Invertibility of Symmetric Matrices 


In general, a symmetric matrix need not be invertible. For example, a diagonal matrix with a zero on the main diagonal is 














symmetric but not invertible. However, the following theorem shows that if a symmetric matrix happens to be invertible, then 
its inverse must also be symmetric. 


THEOREM 1.7.4 

If A is an invertible symmetric matrix, then A is symmetric. 


Assume that A is symmetric and invertible. From Theorem 1.4.9 and the fact that A = A^, we have 

which proves that A is symmetric. 


Products AA t and A 1 A 


Matrix products of the form AA T and A T A arise in a variety of applications. If A is an m x n matrix, then A T is an n x m 
matrix, so the products AA T and A T A are both square matrices—the matrix AA T has size mxm-> and the matrix A T A has size 
yi x K- Such products are always symmetric since 


[aA t ) T ={a t ) T A t = AA t and {A T A) T = A T {A T ) T = A T A 


EXAMPLE 6 The Product of a Matrix and Its Transpose Is Symmetric 


Let A be the 2x3 matrix 



Then 

-2 4 

0 -5 

1 3 

-2 0 

4 -5 

T T 

Observe that A A and AA are symmetric as expected. 



4 

-5 


10 

-2 

-11 

-2 

4 

-8 

-11 

-8 

41 

21 

—17" 


-17 

34 



T T 

Later in this text, we will obtain general conditions on A under which AA and A A are invertible. However, in the special 
case where A is square , we have the following result. 


THEOREM 1.7.5 

T T 

If A is an invertible matrix, then A A and A A are also invertible. 
















Since A is invertible, so is A T by Theorem 1.4.9. Thus AA T and A T A are invertible, since they are the products of 
invertible matrices. 


Concept Review 

Diagonal matrix 
Lower triangular matrix 
Upper triangular matrix 
Triangular matrix 
Symmetric matrix 

Skills 

Determine whether a diagonal matrix is invertible with no computations. 
Compute matrix products involving diagonal matrices by inspection. 

Determine whether a matrix is triangular. 

Understand how the transpose operation affects diagonal and triangular matrices. 
Understand how inversion affects diagonal and triangular matrices. 

Determine whether a matrix is a symmetric matrix. 


Exercise Set 1.7 


In Exercises 1^1, determine whether the given matrix is invertible. 

1. [2 O' 

0 -5 


Answer: 


2 . 


3 . 


4 0 0 
0 0 0 
0 0 5 

-1 0 0 

0 2 0 

0 0 | 


Answer: 










4. 


-10 0 

0 1 0 

0 0 3 

-10 0 0 
0 3 0 0 

0 0-3 0 

0 0 0 -2 


In Exercises 5-8, determine the product by inspection. 


5. 

"3 0 O' 

2 r 


0-10 

-4 1 


1 

o 

o 

2 5 


Answer: 


6 3 

4 -1 
4 10 


6 . 


7. 


1 2 • 

-3 -1 

1-1 

m o 

1 i 

'-4 0 O' 
0 3 0 

0 0 2 

"5 0 0~ 

-3 2 0 4 

0 2 0 

1-530 

CO 

I 

o 

o 

-6 2 2 2 


-4 

3 

2 


Answer: 



'-15 

10 

0 

20 

-20] 





2 ■ 

-10 

6 

0 

6 

) 





18 

-6 

-6 

-6 

-6 

1 




8. 

'2 0 

O' 

4 

-1 

3' 

" 

-3 

0 

O' 


0 -1 

0 

1 

2 

0 


0 

5 

0 


O 

o 

4 

-5 

1 

-2 


0 

0 

2 


In Exercises 9-12, find A and A * (where k is any integer) by inspection. 


A = 


1 0 
0 -2 


Answer: 




1 0 
0 4 



10 . -6 0 0 

A= 0 3 0 
0 0 5 



4 


1 0 
0 1 / (—2) k 




12 . [-2 0 0 " 

0-4 0 0 

0 0-3 0 

0 0 0 2 

In Exercises 13-19, decide whether the given matrix is symmetric. 

13.1"-8 -8" 

0 . 

Answer: 

Not symmetric 


Answer: 

Symmetric 

16. [3 4 

_4 0 

17. [0 1 2" 

1 5 -6 

2 6 6 

Answer: 

Not symmetric 

is. r -i 3" 

-1 5 1 

1 7 

19.ro o r 
0 2 0 
3 0 0 


Answer: 



Not symmetric 


In Exercises 20-22, decide by inspection whether the given matrix is invertible. 

20. [-1 2 4" 

0 3 0 
0 0 5 

21. r0 1 -2 5' 

0 1 5 6 

0 0-31 
0 0 0 5 

Answer: 

Not invertible 

22. [ 2 0 0 O' 

-3-10 0 

-4-60 0 

0 3 8 -5 

In Exercises 23-24, find all values of the unknown constant(s) in order for A to be symmetric. 



Answer: 


a = — 8 

24. 2 a — 2b + 2c 2a 4- b 4- c 

A= 3 5 a+c 

_0 -2 7 

In Exercises 25-26, find all values of x in order for A to be invertible. 


A = 0 x + 2 x 3 

0 0 x-4_ 

Answer: 


x*l, -2,4 



In Exercises 27-28, find a diagonal matrix A that satisfies the given condition. 

27. [1 0 0 
A 5 = 0-1 0 

0 0-1 


Answer: 



"1 0 o' 

0-1 0 
0 0 - 1 _ 

28. [9 0 0’ 

A ~ 2 = 0 4 0 

0 0 1 

29. Verify Theorem 1.7.1(6) for the product AB, where 

-1 2 5' 

A= 0 1 3 , B 

0 0 — 4 _ 

30. Verify Theorem 1.1.1(d) for the matrices A and B in Exercise 29. 

31. Verify Theorem 1.7.4 for the given matrix A. 



(b) [1-2 3' 

A= -2 1 -7 

3-7 4 

32. Let A be an « x « symmetric matrix. 

( a ) Show that A 2 is symmetric. 

(b) Show that 2 A 2 — 3A |-1 is symmetric. 

33. Prove: If A 7 A = A> then A is symmetric and A = A 2 - 

34. Find all 3 x 3 diagonal matrices A that satisfy A 2 — 3A — 4/ = 0- 

35. Let A = [a,j] be an n xn matrix. Determine whether^ is symmetric. 

(a) ai j=i 2 +J 2 

(b) ai j=i 2 -j 2 

(c) a ij = 2i + 2J 

(d) fly = 2j 2 + 2y 3 

Answer: 

(a) Yes 

(b) No (unless ^ = 1) 

(c) Yes 

(d) No (unless n = 1) 

36. On the basis of your experience with Exercise 35, devise a general test that can be applied to a formula for ay to determine 
whether *4= [^ 2 ;] is symmetric. 

37. A square matrix A is called skew-symmetric if = — A- 
Prove: 

(a) If ^4 is an invertible skew-symmetric matrix, then is skew-symmetric. 

(b) If A and B are skew-symmetric matrices, then so are A , A + B, A — B, and kA for any scalar k. 


'2 -8 O' 
= 0 2 1 
0 0 3 



(c) Every square matrix A can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix. [Hint: Note 
the identity A = ^ (a + A r J + ^ (a - A r J.] 

In Exercises 38-39, fill in the missing entries (marked with x) to produce a skew-symmetric matrix. 


38. 

1 

X 

X 

4^ 

A = 

0 x x 


X —1 X 

39. 

1 

X 

O 

X 

A = 

X X -4 


CO 

X 

X 


Answer: 

"0 0 - 8 ' 

0 0-4 
8 4 0 

2a - 5b + 5c 
5a — 8b + 6c 
d 


40. Find all values of a, b, c, and d for which A is skew-symmetric. 

0 2a — 3b + c 


A = 


-2 

-3 


0 

-5 


41. We showed in the text that the product of symmetric matrices is symmetric if and only if the matrices commute. Is the 
product of commuting skew-symmetric matrices skew- symmetric? Explain. [Note: See Exercise 37 for the deffinition of 
skew-symmetric .] 

42. If the n x n matrix A can be expressed as A = LU, where L is a lower triangular matrix and U is an upper triangular 
matrix, then the linear system Ax = b can be expressed as LUx = b and can be solved in two steps: 

Step 1. Let Ux = y, so that LUx — b can be expressed as Ly = b. Solve this system. 

Step 2. Solve the system Ux = y for x. 

In each part, use this two-step method to solve the given system. 


( a ) 

1 

o 

o 

"2 -l 3:1 

"*1~ 


r 


-2 3 0 

0 1 2 

x 2 

= 

-2 


2 4 1 

i 

^r 

o 

o 

*3 


0 


(b) 

o 

o 

CM 

"3 -5 2' 

"*l" 


A 


4 1 0 

-3 -2 3 

0 4 1 

0 0 2 

*2 

/3_ 

— 

-5 

2 


43. Find an upper triangular matrix that satisfies 



30 

-8 


Answer: 


A = 


1 

0 


10 

-2 


True-False Exercises 


In parts (a)-(m) determine whether the statement is true or false, and justify your answer. 


(a) The transpose of a diagonal matrix is a diagonal matrix. 




























Answer: 


True 

(b) The transpose of an upper triangular matrix is an upper triangular matrix. 

Answer: 

False 

(c) The sum of an upper triangular matrix and a lower triangular matrix is a diagonal matrix. 

Answer: 

False 

(d) All entries of a symmetric matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 

True 

(e) All entries of an upper triangular matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 

True 

(f) The inverse of an invertible lower triangular matrix is an upper triangular matrix. 

Answer: 

False 

(g) A diagonal matrix is invertible if and only if all of its diagonal entries are positive. 

Answer: 

False 

(h) The sum of a diagonal matrix and a lower triangular matrix is a lower triangular matrix. 

Answer: 

True 

(') A matrix that is both symmetric and upper triangular must be a diagonal matrix. 

Answer: 

True 

(j) If A and B are ^ x n matrices such that A 4- B is symmetric, then A and B are symmetric. 

Answer: 

False 

(k) If A and B are ^ x n matrices such that A 4- B is upper triangular, then A and B are upper triangular. 

Answer: 

False 

(i) If A 2 is a symmetric matrix, then A is a symmetric matrix. 



Answer: 


False 

(m) If kA is a symmetric matrix for some t ^ 0? then A is a symmetric matrix. 

Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.8 Applications of Linear Systems 

In this section we will discuss some relatively brief applications of linear systems. These are but a small sample of the wide 
variety of real-world problems to which our study of linear systems is applicable. 


Network Analysis 

The concept of a network appears in a variety of applications. Loosely stated, a network is a set of branches through which 
something “flows.” For example, the branches might be electrical wires through which electricity flows, pipes through 
which water or oil flows, traffic lanes through which vehicular traffic flows, or economic linkages through which money 
flows, to name a few possibilities. 

In most networks, the branches meet at points, called nodes or junctions , where the flow divides. For example, in an 
electrical network, nodes occur where three or more wires join, in a traffic network they occur at street intersections, and in 
a financial network they occur at banking centers where incoming money is distributed to individuals or other institutions. 

In the study of networks, there is generally some numerical measure of the rate at which the medium flows through a 
branch. For example, the flow rate of electricity is often measured in amperes, the flow rate of water or oil in gallons per 
minute, the flow rate of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros per day. 

We will restrict our attention to networks in which there is flow conservation at each node, by which we mean that the rate 
offlow into any node is equal to the rate offlow out of that node. This ensures that the flow medium does not build up at 
the nodes and block the free movement of the medium through the network. 

A common problem in network analysis is to use known flow rates in certain branches to find the flow rates in all of the 
branches. Here is an example. 

EXAMPLE 1 Network Analysis Using Linear Systems 

Figure 1.8.1 shows a network with four nodes in which the flow rate and direction of flow in certain 
branches are known. Find the flow rates and directions of flow in the remaining branches. 


30 



As illustrated in Figure 1.8.2, we have assigned arbitrary directions to the unknown flow rates 
x\, X2 , and * 3 . We need not be concerned if some of the directions are incorrect, since an incorrect direction 
will be signaled by a negative value for the flow rate when we solve for the unknowns. 


30 



It follows from the conservation of flow at node A that 

*1 +*2 = 30 

Similarly, at the other nodes we have 

*2 + *3 = 35 (node B) 

*3 + 15 = 60 (nodeC) 

*i + 15 = 55 (node D) 

These four conditions produce the linear system 

*1 + *2 = 30 

7T2 + *3 = 35 
*3 = 45 
*1 =40 

which we can now try to solve for the unknown flow rates. In this particular case the system is sufficiently 
simple that it can be solved by inspection (work from the bottom up). We leave it for you to confirm that the 
solution is 

*1=40, *2 = — 10, *3 = 45 

The fact that *2 is negative tells us that the direction assigned to that flow in Figure 1.8.2 is incorrect; that is, 
the flow in that branch is into node A. 


EXAMPLE 2 Design of Traffic Patterns 

The network in Figure 1.8.3 shows a proposed plan for the traffic flow around a new park that will house the 
Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a computerized traffic light at the north exit on 
Fifth Street, and the diagram indicates the average number of vehicles per hour that are expected to flow in 
and out of the streets that border the complex. All streets are one-way. 

How many vehicles per hour should the traffic light let through to ensure that the average number of 
vehicles per hour flowing into the complex is the same as the average number of vehicles flowing out? 

Assuming that the traffic light has been set to balance the total flow in and out of the complex, what can 
you say about the average number of vehicles per hour that will flow along the streets that border the 
complex? 
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Figure 1 . 8.3 


Solution 

If, as indicated in Figure 1.8.36 we let x denote the number of vehicles per hour that the traffic light must 
let through, then the total number of vehicles per hour that flow in and out of the complex will be 

Flowing in: 500 4= 400 4- 600 + 200 = 1700 
Flowing out: * 4~ 700 4= 400 

Equating the flows in and out shows that the traffic light should let x = 600 vehicles per hour pass 
through. 

To avoid traffic congestion, the flow in must equal the flow out at each intersection. For this to happen, 
the following conditions must be satisfied: 


Intersection 

Flow In 

Flow Out 

A 

400 4- 600 

= *l+*2 

B 

*2 4 -*3 

= 400 4-* 

C 

500 4 - 200 

= *3 +*4 

D 

x\ 4**4 

700 


Thus, with x = 600? as computed in part (a), we obtain the following linear system: 

*1 + *2 = 1000 

*2 4 = *3 = 1000 

*3 4 = * 4 = ^00 

*1 4 - *4 = 700 

We leave it for you to show that the system has infinitely many solutions and that these are given by the 
parametric equations 


*1 = 700 — t, *2 = 300 “M, *3 = 700 —*4 = t (l) 

However, the parameter t is not completely arbitrary here, since there are physical constraints to be 
considered. For example, the average flow rates must be nonnegative since we have assumed the streets 
to be one-way, and a negative flow rate would indicate a flow in the wrong direction. This being the 
case, we see from 1 that t can be any real number that satisfies 0 < t < 700, which implies that the 
average flow rates along the streets will fall in the ranges 

0 < *1 < 700, 300 < *2 < 1000, 0 < *3 < 700, 0 < *4 < 700 















Electrical Circuits 


Next, we will show how network analysis can be used to analyze electrical circuits consisting of batteries and resistors. A 
battery is a source of electric energy, and a resistor , such as a lightbulb, is an element that dissipates electric energy. Figure 
1.8.4 shows a schematic diagram of a circuit with one battery (represented by the symbol |j_), one resistor (represented by 

the symbol ^vyv-)> an d a switch. The battery has a positive pole (+) and a negative pole (-). When the switch is closed, 
electrical current is considered to flow from the positive pole of the battery, through the resistor, and back to the negative 
pole (indicated by the arrowhead in the figure). 

-► 


Switch 

Figure 1.8.4 

Electrical current, which is a flow of electrons through wires, behaves much like the flow of water through pipes. A battery 
acts like a pump that creates “electrical pressure” to increase the flow rate of electrons, and a resistor acts like a restriction 
in a pipe that reduces the flow rate of electrons. The technical term for electrical pressure is electrical potential .; it is 
commonly measured in volts (V). The degree to which a resistor reduces the electrical potential is called its resistance and 
is commonly measured in ohms (£1). The rate of flow of electrons in a wire is called current and is commonly measured in 
amperes (also called amps) (A). The precise effect of a resistor is given by the following law: 

r n 



Ohm's Law 

If a current of / amperes passes through a resistor with a resistance of R ohms, then there is a resulting drop of E 
volts in electrical potential that is the product of the current and resistance; that is, 

E = 1R 


J 


A typical electrical network will have multiple batteries and resistors joined by some configuration of wires. A point at 
which three or more wires in a network are joined is called a node (or junction point). A branch is a wire connecting two 
nodes, and a closed loop is a succession of connected branches that begin and end at the same node. For example, the 
electrical network in Figure 1.8.5 has two nodes and three closed loops— two inner loops and one outer loop. As current 
flows through an electrical network, it undergoes increases and decreases in electrical potential, called voltage rises and 
voltage drops , respectively. The behavior of the current at the nodes and around closed loops is governed by two 
fundamental laws: 



Figure 1.8.5 










Kirchhoffs Current Law 


The sum of the currents flowing into any node is equal to the sum of the currents flowing out. 


J 

n 


Kirchhoffs Voltage Law 

In one traversal of any closed loop, the sum of the voltage rises equals the sum of the voltage drops. 


L J 

Kirchhoffs current law is a restatement of the principle of flow conservation at a node that was stated for general networks. 
Thus, for example, the currents at the top node in Figure 1.8.6 satisfy the equation 7 1 =72 + 73 . 

—► 

A 

Figure 1.8.6 



In circuits with multiple loops and batteries there is usually no way to tell in advance which way the currents are flowing, 
so the usual procedure in circuit analysis is to assign arbitrary directions to the current flows in the branches and let the 
mathematical computations determine whether the assignments are correct. In addition to assigning directions to the 
current flows, Kirchhoffs voltage law requires a direction of travel for each closed loop. The choice is arbitrary, but for 
consistency we will always take this direction to be clockwise (Figure 1.8.7). We also make the following conventions: 

A voltage drop occurs at a resistor if the direction assigned to the current through the resistor is the same as the direction 
assigned to the loop, and a voltage rise occurs at a resistor if the direction assigned to the current through the resistor is 
the opposite to that assigned to the loop. 

A voltage rise occurs at a battery if the direction assigned to the loop is from - to + through the battery, and a voltage 
drop occurs at a battery if the direction assigned to the loop is from + to - through the battery. 

If you follow these conventions when calculating currents, then those currents whose directions were assigned correctly 
will have positive values and those whose directions were assigned incorrectly will have negative values. 
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Figure 1.8.7 


EXAMPLE 3 A Circuit with One Closed Loop 












Determine the current / in the circuit shown in Figure 1.8.8. 

/ 



Figure 1.8.8 

Since the direction assigned to the current through the resistor is the same as the direction of the 
loop, there is a voltage drop at the resistor. By Ohm’s law this voltage drop is E = [R= 3/. Also, since the 
direction assigned to the loop is from - to + through the battery, there is a voltage rise of 6 volts at the 
battery. Thus, it follows from Kirchhoff s voltage law that 

3 / = 6 

from which we conclude that the current is / = 2 A- Since I is positive, the direction assigned to the current 
flow is correct. 


EXAMPLE 4 A Circuit with Three Closed Loops 

Determine the currents l\, lj, and /j in the circuit shown in Figure 1.8.9. 



Figure 1.8.9 


Using the assigned directions for the currents, Kirchhoff s current law provides one equation for 

each node: 


Node Current In Current Out 

a h+h = h 

b h h^h 

However, these equations are really the same, since both can be expressed as 


h + ^2 — h = 0 


( 2 ) 









Gustav Kirchhoff (1824-1887) 

The German physicist Gustav Kirchhoff was a student of Gauss. His work on 
Kirchhoff s laws, announced in 1854, was a major advance in the calculation of currents, voltages, 
and resistances of electrical circuits. Kirchhoff was severely disabled and spent most of his life on 
crutches or in a wheelchair. 
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To find unique values for the currents we will need two more equations, which we will obtain from 
Kirchhoff s voltage law. We can see from the network diagram that there are three closed loops, a left inner 
loop containing the 50 V battery, a right inner loop containing the 30 V battery, and an outer loop that 
contains both batteries. Thus, Kirchhoff s voltage law will actually produce three equations. With a 
clockwise traversal of the loops, the voltage rises and drops in these loops are as follows: 

Voltage Rises Voltage Drops 
Left Inside Loop 50 5/i+20/3 

Right Inside Loop 30 + IO /2 + 2 O /3 0 

Outside Loop 30 + 50+ 10/2 51 1 

These conditions can be rewritten as 

5/i + 2 O /3 = 50 

IO /2 + 2 O /3 = —30 ( 3 ) 

5 / 1 -IO /2 = 80 

However, the last equation is superfluous, since it is the difference of the first two. Thus, if we combine 2 
and the first two equations in 3, we obtain the following linear system of three equations in the three 
unknown currents: 

l\ + I 2 — h — 0 

5/i + 2 O /3 = 50 

IO /2 + 2 O /3 = -30 

We leave it for you to solve this system and show that l\ = 6 A, /2 = — 5 A, and 1 2 = 1 A. The fact that 1 2 
is negative tells us that the direction of this current is opposite to that indicated in Figure 1.8.9. 


Balancing Chemical Equations 



Chemical compounds are represented by chemical formulas that describe the atomic makeup of their molecules. For 
example, water is composed of two hydrogen atoms and one oxygen atom, so its chemical formula is H 2 O; and stable 
oxygen is composed of two oxygen atoms, so its chemical formula is O 2 . 

When chemical compounds are combined under the right conditions, the atoms in their molecules rearrange to form new 
compounds. For example, when methane bums, the methane (CH 4 ) and stable oxygen (O 2 ) react to form carbon dioxide 
(CO 2 ) and water (H 2 O). This is indicated by the chemical equation 

CH 4 + 0 2 —► CO 2 + H 2 O (4) 

The molecules to the left of the arrow are called the reactants and those to the right the products. In this equation the plus 
signs serve to separate the molecules and are not intended as algebraic operations. However, this equation does not tell the 
whole story, since it fails to account for the proportions of molecules required for a complete reaction (no reactants left 
over). For example, we can see from the right side of 4 that to produce one molecule of carbon dioxide and one molecule 
of water, one needs three oxygen atoms for each carbon atom. However, from the left side of 4 we see that one molecule of 
methane and one molecule of stable oxygen have only two oxygen atoms for each carbon atom. Thus, on the reactant side 
the ratio of methane to stable oxygen cannot be one-to-one in a complete reaction. 

A chemical equation is said to be balanced if for each type of atom in the reaction, the same number of atoms appears on 
each side of the arrow. For example, the balanced version of Equation 4 is 

CH 4 + 20 2 — C0 2 + 2H 2 0 (5) 

by which we mean that one methane molecule combines with two stable oxygen molecules to produce one carbon dioxide 
molecule and two water molecules. In theory, one could multiply this equation through by any positive integer. For 
example, multiplying through by 2 yields the balanced chemical equation 

2CH 4 4 - 40 2 — 2C0 2 + 4H 2 0 

However, the standard convention is to use the smallest positive integers that will balance the equation. 

Equation 4 is sufficiently simple that it could have been balanced by trial and error, but for more complicated chemical 
equations we will need a systematic method. There are various methods that can be used, but we will give one that uses 
systems of linear equations. To illustrate the method let us reexamine Equation 4. To balance this equation we must find 
positive integers, x\, x 2 , * 3 , and x 4 such that 

X!(CH 4 ) +X 2 (0 2 ) —>X3(C0 2 ) + x 4 (H 2 0) (6) 

For each of the atoms in the equation, the number of atoms on the left must be equal to the number of atoms on the right. 
Expressing this in tabular form we have 



Left Side 

Right Side 

Carbon 

*i 

*3 

Hydrogen 

4xi = 

2x 4 

Oxygen 

2x 2 

2 x 3 + X 4 


from which we obtain the homogeneous linear system 

x\ — *3 =0 

4xj — 2x 4 =0 

2x 2 — 2 x 3 “ x 4 = 0 


The augmented matrix for this system is 



10-1 0 0 
40 0-20 

_0 2 -2 -1 0 _ 

We leave it for you to show that the reduced row echelon form of this matrix is 

100-^0 
010 -10 
001-^0 

from which we conclude that the general solution of the system is 

x\=t!2 r X 2 = t, X 2 = tl2, X 4 = t 

where t is arbitrary. The smallest positive integer values for the unknowns occur when we let t = 2 , so the equation can be 
balanced by letting x\ = \, *2 = 2 , 7:3 = 1, 7:4=2. This agrees with our earlier conclusions, since substituting these 
values into Equation 6 yields Equation 5 . 

EXAMPLE 5 Balancing Chemical Equations Using Linear Systems 

Balance the chemical equation 

HC1 + Na 3 P0 4 — H 3 PO 4 + NaCl 
[hydrochloric acid] 4 - [sodiumphosphate] —► [phosphoric acid] -H [sodium chloride] 

Let 7:1, 7:3, 7:3, and *4 be positive integers that balance the equation 

xi(HCl) +* 2 (Na 3 P 0 4 ) ^* 3 (H 3 P0 4 ) + * 4 (NaCl) (7) 

Equating the number of atoms of each type on the two sides yields 

It: 1 = 3 t : 3 Hydrogen(H) 

It: 1 = 1 t : 4 Chlorine (Cl) 

37:2 = 1 t : 4 Sodium(Na) 

It: 2 = 1*3 Phosphorous(P) 

4*2 = 4*3 Oxygen(O) 

from which we obtain the homogeneous linear system 

*1 — 37:3 =0 

*1 — *4 = 0 

3t:2 —7:4=0 

x 2 “ ^3 =0 

47:2-47:3 =0 

We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is 

"lOO -1 0 " 

0 1 0 0 

0 0 1 —i 0 

000 00 
000 00 

from which we conclude that the general solution of the system is 








x\ =t, X2 = t! 3 7 X2 = tl 3 , X4 = t 

where t is arbitrary. To obtain the smallest positive integers that balance the equation, we let t = 3, in which 
case we obtain x\ =3, X2= h *3=1, and x 4 = 3. Substituting these values in 7 produces the balanced 
equation 

3HC1 + Na 3 P0 4 — H 3 PO 4 + 3NaCl 


Polynomial Interpolation 

An important problem in various applications is to find a polynomial whose graph passes through a specified set of points 
in the plane; this is called an interpolating polynomial for the points. The simplest example of such a problem is to find a 
linear polynomial 


p(x) =ax -hb (8) 

whose graph passes through two known distinct points, (* j ? y ^ ) and (* 2? y 1 ), in the xy-plane (Figure 1.8.10). You have 
probably encountered various methods in analytic geometry for finding the equation of a line through two points, but here 
we will give a method based on linear systems that can be adapted to general polynomial interpolation. 

n y 

y = ax + b 

(•*>» >2) 

x 

- 

Figure 1.8.10 


The graph of 8 is the line y = ax 4= b, and for this line to pass through the points j, y \) and (^ 2 , y 2 ) > we must have 

y\=ax\^b and 72 =^ 2 +^ 

Therefore, the unknown coefficients a and b can be obtained by solving the linear system 

ax\ +b = 7 ! 

ax2 + b=y2 

We don't need any fancy methods to solve this system—the value of a can be obtained by subtracting the equations to 
eliminate b , and then the value of a can be substituted into either equation to find b. We leave it as an exercise for you to 
find a and b and then show that they can be expressed in the form 

« = xizizi. ^ b= ym=ym 

x 2 -xi x 2 -xi 


provided x 1 * * 2 - Thus, for example, the line y = ax \ b that passes through the points 

(2,1) and (5,4) 

can be obtained by taking (xj, 71 ) = (2, 1) and (x 3 , 72 ) = (5, 4)> h 1 which case 9 yields 


a = 


4- 1 

5- 2 


1 and b ■ 


(1)(5) —(4)(2) 
5-2 




Therefore, the equation of the line is 


y =x — 1 








(Figure 1.8.11). 
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Figure 1.8.11 


Now let us consider the more general problem of finding a polynomial whose graph passes through n points with distinct 
x-coordinates 


(*1. 7l), (*2. 72), (*3. 73), -, (*m, 7m) 


( 10 ) 


Since there are n conditions to be satisfied, intuition suggests that we should begin by looking for a polynomial of the form 



(ii) 


since a polynomial of this form has n coefficients that are at our disposal to satisfy the n conditions. However, we want to 
allow for cases where the points may lie on a line or have some other configuration that would make it possible to use a 
polynomial whose degree is less than n _ ]; thus, we allow for the possibility that 1 and other coefficients in 11 may 
be zero. 

The following theorem, which we will prove later in the text, is the basic result on polynomial interpolation. 


Polynomial Interpolation 

Given any n points in the xy-plane that have distinct x-coordinates, there is a unique polynomial of degree n —1 
or less whose graph passes through those points. 


Let us now consider how we might go about finding the interpolating polynomial 11 whose graph passes through the points 
in 10. Since the graph of this polynomial is the graph of the equation 



( 12 ) 


it follows that the coordinates of the points must satisfy 



( 13 ) 



In these equations the values of x's and /s are assumed to be known, so we can view this as a linear system in the 
unknowns ag, a \, .. a n -\. From this point of view the augmented matrix for the system is 




( 14 ) 


1 XI x\ ... x” 1 y 1 

1 x 2 x 2 — ^2 _1 ^2 

1 x n x n ... x M 7 >5 

and hence the interpolating polynomial can be found by reducing this matrix to reduced row echelon form (Gauss-Jordan 
elimination). 

EXAMPLE 6 Polynomial Interpolation by Gauss-Jordan Elimination 

Find a cubic polynomial whose graph passes through the points 

0.3). (2. -2). (3. -5). (4.0) 

Since there are four points, we will use an interpolating polynomial of degree « = 3- Denote this 
polynomial by 

2 3 

p(x) =aQ + a\x 4- a^x 4- 33 * 
and denote the x- and y-coordinates of the given points by 

*1 = 1, x 2 = 2, *3 = 3, x 4 = 4 and y\ = 3, y2= - 2, 73 = - 5, 74 = 0 

Thus, it follows from 14 that the augmented matrix for the linear system in the unknowns ctQ, a\, 132 , and «3 
is 


1 „ 2 3 

1 x\ x i y\ 

1 -r 3 

1 X2 *2 *2 y 2 


'1111 3' 

1 2 4 8 -2 

1 *3 x 3 X3 73 


1 3 9 27 -5 

1 x 4 x 4 jtJ 74 


1 4 16 64 0 


We leave it for you to confirm that the reduced row echelon form of this matrix is 

1 0 0 0 4 

0 10 0 3 

0 0 10-5 

0 0 0 1 1 

from which it follows that a q = 4, a\ = 3, ctj — — 5 , 133 = 1 . Thus, the interpolating polynomial is 

p(x)=4 + 3x-5x 2 + x 2 

The graph of this polynomial and the given points are shown in Figure 1.8.12. 













Figure 1.8.12 


Later we will give a more efficient method for finding interpolating polynomials that is better suited for 
problems in which the number of data points is large. 

CALCULUS AND CALCULATING UTILITY REQUIRED 

EXAMPLE 7 Approximate Integration 

There is no way to evaluate the integral 



directly since there is no way to express an antiderivative of the integrand in terms of elementary functions. 
This integral could be approximated by Simpson's rule or some comparable method, but an alternative 
approach is to approximate the integrand by an interpolating polynomial and integrate the approximating 
polynomial. For example, let us consider the five points 

*0 = 0, xi = 0.25, *2 = 0.5, *3 = 0.75, * 4=1 
that divide the interval [0, 1] into four equally spaced subintervals. The values of 

/ O) = sin 

at these points are approximately 

/(0) = 0, y (0.25) = 0.098017, / (0.5) = 0.382683, / (0.75) = 0.77301, /(1) = 1 
The interpolating polynomial is (verify) 

p(x) = 0.098796* + 0.762356* 2 + 2.14429* 3 - 2.00544* 4 (15) 

and 

p{x)dx& 0.438501 (16) 

As shown in Figure 1.8.13, the graphs of/ and p match very closely over the interval [0, 1], so the 
approximation is quite good. 
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Figure 1.8.13 






Concept Review 

Network 

Branches 

Nodes 

Flow conservation 

Electrical circuits: battery, resistor, poles (positive and negative), electrical potential, Ohm's law, Kirchhoff s 
current law, Kirchhoff s voltage law 

Chemical equations: reactants, products, balanced equation 
Interpolating polynomial 

Skills 

Find the flow rates and directions of flow in branches of a network. 

Find the amount of current flowing through parts of an electrical circuit. 

Write a balanced chemical equation for a given chemical reaction. 

Find an interpolating polynomial for a graph passing through a given collection of points. 


Exercise Set 1.8 

1. The accompanying figure shows a network in which the flow rate and direction of flow in certain branches are known. 
Find the flow rates and directions of flow in the remaining branches. 


50 



Answer: 


50 



2. The accompanying figure shows known flow rates of hydrocarbons into and out of a network of pipes at an oil refinery, 
(a) Set up a linear system whose solution provides the unknown flow rates. 


(b) Solve the system for the unknown flow rates. 

(c) Find the flow rates and directions of flow if *4 = 50 and x$ = 0. 

150 
*5 

200 
175 

Figure Ex-2 

3. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 
rates along the streets are measured as the average number of vehicles per hour. 

(a) Set up a linear system whose solution provides the unknown flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) If the flow along the road from A to B must be reduced for construction, what is the minimum flow that is required 
to keep traffic flowing on all roads? 




Answer: 


(a) *3 — *4 = —500, — 7 : 1 + 7 : 4 = 100 , t: 1 — 7:2 = 300, 7:2 — 7:3 = 100 

(b) = — 100 + X2= —400 + £, 7:3 = — 500 + *, 7:4 = t 

(c) For all rates to be nonnegative, we need t = 500 cars per hour, so 7:1 = 400, 7:2 = 100, 7:3 = 0, 7:4 = 500 

4. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 
rates along the streets are measured as the average number of vehicles per hour. 

(a) Set up a linear system whose solution provides the unknown flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) Is it possible to close the road from A to B for construction and keep traffic flowing on the other streets? Explain. 
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Figure Ex-4 


In Exercises 5-8, analyze the given electrical circuits by finding the unknown currents. 


5 . 















Answer: 



/l = ^A, h=-\ A, /3 = -j-A 


6 . 


7. 




Answer: 

/ i =/ 4 =/ 5 = / 6 = Ia , / 2 =/ 3 = oa 



In Exercises 9-12, write a balanced equation for the given chemical reaction. 
9 . C 3 H 3 O 2 —► CO 2 4= H 2 O (propane combustion) 

Answer: 


xi = 1 , X2 = 5 , X3 = 3 , and *4 = 4 ; the balanced equation is C3H2 -f 502 — ► 3C02 + 4H2O 

10. —► CO2 + C2H5OH ( fermentation of sugar) 

11 . CH 3 COF 4 H 2 0 -> CH 3 COOH 4 HF 






Answer: 


xi = X2 = *3 = *4 = t; the balanced equation is CH 3 COF 4 H 2 O —► CH 3 COOH -I HF 

12. CO 2 4 H 2 O —► 4 O 2 ( photosynthesis) 

13. Find the quadratic polynomial whose graph passes through the points (1, 1), (2, 2), and (3, 5). 
Answer: 

p(x) = x 2 — 2x 4 2 

14. Find the quadratic polynomial whose graph passes through the points (0, 0), (-1, 1), and (1, 1). 

15. Find the cubic polynomial whose graph passes through the points (-1, -1), (0, 1), (1, 3), (4, -1). 

Answer: 
p(x) = 1 + 

16. The accompanying figure shows the graph of a cubic polynomial. Find the polynomial. 



Figure Ex-16 

(a) Find an equation that represents the family of all second-degree polynomials that pass through the points (0, 1) and 
(1,2). [Hint: The equation will involve one arbitrary parameter that produces the members of the family when 
varied.] 

(b) By hand, or with the help of a graphing utility, sketch four curves in the family. 

Answer: 

'y 

(a) Using a\ = k as a parameter, p(x) = 1 -F kx 4- (1 — k)x where — 00 < k < 00 • 

(b) The graphs for k = 0, 1, 2, and 3 are shown. 
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18. In this section we have selected only a few applications of linear systems. Using the Internet as a search tool, try to find 
some more real-world applications of such systems. Select one that is of interest to you, and write a paragraph about it. 






























True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) In any network, the sum of the flows out of a node must equal the sum of the flows into a node. 

Answer: 

True 

(b) When a current passes through a resistor, there is an increase in the electrical potential in a circuit. 

Answer: 

False 

(c) Kirchhoff s current law states that the sum of the currents flowing into a node equals the sum of the currents flowing out 
of the node. 

Answer: 

True 

(d) A chemcial equation is called balanced if the total number of atoms on each side of the equation is the same. 

Answer: 

False 

(e) Given any n points in the xy-plane, there is a unique polynomial of degree n — \ or less whose graph passes through 
those points. 

Answer: 

False 
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1.9 Leontief Input-Output Models 

In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on economic modeling in which he 
used matrix methods to study the relationships between different sectors in an economy. In this section we will discuss 
some of the ideas developed by Leontief. 


Inputs and Outputs in an Economy 

One way to analyze an economy is to divide it into sectors and study how the sectors interact with one another. For 
example, a simple economy might be divided into three sectors—manufacturing, agriculture, and utilities. Typically, a 
sector will produce certain outputs but will require inputs from the other sectors and itself. For example, the agricultural 
sector may produce wheat as an output but will require inputs of farm machinery from the manufacturing sector, 
electrical power from the utilities sector, and food from its own sector to feed its workers. Thus, we can imagine an 
economy to be a network in which inputs and outputs flow in and out of the sectors; the study of such flows is called 
input-output analysis. Inputs and outputs are commonly measured in monetary units (dollars or millions of dollars, for 
example) but other units of measurement are also possible. 

The flows between sectors of a real economy are not always obvious. For example, in World War II the United States had 
a demand for 50,000 new airplanes that required the construction of many new aluminum manufacturing plants. This 
produced an unexpectedly large demand for certain copper electrical components, which in turn produced a copper 
shortage. The problem was eventually resolved by using silver borrowed from Fort Knox as a copper substitute. In all 
likelihood modem input-output analysis would have anticipated the copper shortage. 

Most sectors of an economy will produce outputs, but there may exist sectors that consume outputs without producing 
anything themselves (the consumer market, for example). Those sectors that do not produce outputs are called open 
sectors. Economies with no open sectors are called closed economies , and economies with one or more open sectors are 
called open economies (Figure 1.9.1). In this section we will be concerned with economies with one open sector, and our 
primary goal will be to determine the output levels that are required for the productive sectors to sustain themselves and 
satisfy the demand of the open sector. 


Manufacturing Agriculture 



Utilities 
Figure 1.9.1 


Leontief Model of an Open Economy 


Let us consider a simple open economy with one open sector and three product-producing sectors: manufacturing, 
agriculture, and utilities. Assume that inputs and outputs are measured in dollars and that the inputs required by the 











productive sectors to produce one dollar’s worth of output are in accordance with Table 1. 


Table 1 




Income Required per Dollar Output 

Manufacturing 

Agriculture 

Utilities 


Manufacturing 

$0.50 

$0.10 

$0.10 

Provider 

Agriculture 

$0.20 

$0.50 

$0.30 


Utilities 

$0.10 

$0.30 

$0.40 



Wassily Leontief (1906-1999) 

It is somewhat ironic that it was the Russian-born Wassily Leontief who won the Nobel prize 
in 1973 for pioneering the modem methods for analyzing free-market economies. Leontief was a precocious 
student who entered the University of Leningrad at age 15. Bothered by the intellectual restrictions of the Soviet 
system, he was put in jail for anti-Communist activities, after which he headed for the University of Berlin, 
receiving his Ph.D. there in 1928. He came to the United States in 1931, where he held professorships at Harvard 
and then New York University. 

[Image: © Bettmann/OCorbis] 


Usually, one would suppress the labeling and express this matrix as 

( 0.5 0.1 0.1 
C = 1 


0.2 

0.1 


0.5 

0.3 


0.3 

0.4 


(i) 


This is called the consumption matrix (or sometimes the technology matrix) for the economy. The column vectors 


ci = 


in C list the inputs required by the manufacturing, agricultural, and utilities sectors, respectively, to produce $1.00 worth 
of output. These are called the consumption vectors of the sectors. For example, c\ tells us that to produce $1.00 worth of 
output the manufacturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural output, and 
$0.10 worth of utilities output. 


0 . 5 ' 


' 0 . 1 ' 


"or 

0.5 

. c 2 = 

0.5 

. c 3 — 

0.3 

0.1 


0.3 


0.4 


What is the economic significance of the row sums 
of the consumption matrix? 





















Continuing with the above example, suppose that the open sector wants the economy to supply it manufactured goods, 
agricultural products, and utilities with dollar values: 

d i dollars of manufactured goods 
d 2 dollars of agricultural products 
d 3 dollars of utilities 

The column vector d that has these numbers as successive components is called the outside demand vector. Since the 
product-producing sectors consume some of their own output, the dollar value of their output must cover their own needs 
plus the outside demand. Suppose that the dollar values required to do this are 

x 1 dollars of manufactured goods 
*2 dollars of agricultural products 
*3 dollars of utilities 


The column vector x that has these numbers as successive components is called the production vector for the economy. 
For the economy with consumption matrix 1, that portion of the production vector x that will be consumed by the three 
productive sectors is 



'0.5' 


"o.f 


"o.f 


1- 

O 

o 

o 

-*r 

*1 

0.2 

0.1 

+ x 2 

0.5 

0.3 

+ *3 

0.3 

0.4 

— 

0.2 0.5 0.3 
0.1 0.3 0.4 

*2 

*3 


Fractions 


Fractions 


Fractions 

consumed by 


consumed by 


consumed 

manufacturing 


agriculture 


by utilities 


The vector Cx is called the intermediate demand vector for the economy. Once the intermediate demand is met, the 
portion of the production that is left to satisfy the outside demand is x — Cx- Thus, if the outside demand vector is d, then 
x must satisfy the equation 


X 

_ 

Cx 

= 

d 

Amount 


Intermediate 


Outside 

produced 


demand 


demand 


which we will find convenient to rewrite as 


(1 — C)x = d (2) 

The matrix / — O’ is called the Leontief matrix and 2 is called the Leontief equation. 

EXAMPLE 1 Satisfying Outside Demand 

Consider the economy described in Table 1. Suppose that the open sector has a demand for $7900 worth of 
manufacturing products, $3950 worth of agricultural products, and $1975 worth of utilities. 

Can the economy meet this demand? 

(b) If so, find a production vector x that will meet it exactly. 


The consumption matrix, production vector, and outside demand vector are 



"0.5 

0.1 

O.f 


"*f 


"7900" 

c= 

0.2 

0.5 

0.3 

, X = 

*2 

, d = 

3950 


0.1 

0.3 

0.4 


*3 


1975 


To meet the outside demand, the vector x must satisfy the Leontief equation 2, so the problem reduces to 
solving the linear system 
























( 4 ) 


T—' 

o 

1 

o 

1 

m 

o 

i_ 


'*l' 


' 7900 ' 

- 0.2 0.5 - 0.3 


*2 


3950 

- 0.1 - 0.3 0.6 


*3 


1975 


l-C x d 

(if consistent). We leave it for you to show that the reduced row echelon form of the augmented matrix for 
this system is 


1 0 0 

27 , 500 ' 

0 1 0 

33,750 

0 0 1 

24,750 


This tells us that 4 is consistent, and the economy can satisfy the demand of the open sector exactly by 
producing $27,500 worth of manufacturing output, $33,750 worth of agricultural output, and $24,750 
worth of utilities output. 


Productive Open Economies 


In the preceding discussion we considered an open economy with three product-producing sectors; the same ideas apply 
to an open economy with n product-producing sectors. In this case, the consumption matrix, production vector, and 
outside demand vector have the form 


C = 

"^11 

c 2\ 

c \2 ’ 

c 22 ' 

c \ n 

' • C2n 

, X = 

■*l' 

*2 

, d = 

d\ 

d 2 


c n\ 

c n2 

c nn 


x n 


d n 


where all entries are nonnegative and 

c * * 

y = the monetary value of the output of the z'th sector that is needed by theyth sector to produce one unit of output 

*r . 

2 = the monetary value of the output of the z'th sector 
= the monetary value of the output of the z'th sector that is required to meet the demand of the open sector 


Note that theyth column vector of C contains the monetary values that theyth sector requires of the other 
sectors to produce one monetary unit of output, and the z'th row vector of C contains the monetary values required of the 
ith sector by the other sectors for each of them to produce one monetary unit of output. 


As discussed in our example above, a production vector x that meets the demand d of the outside sector must satisfy the 
Leontief equation 

(/ — C)x = d 

If the matrix / _ C is invertible, then this equation has the unique solution 

X =(/-C)" 1 d (5) 

for every demand vector d. However, for x to be a valid production vector it must have nonnegative entries, so the 
problem of importance in economics is to determine conditions under which the Leontief equation has a solution with 
nonnegative entries. 

It is evident from the form of 5 that if / — C is invertible, and if (/ — C) has non-negative entries, then for every 

















demand vector d the corresponding x will also have non-negative entries, and hence will be a valid production vector for 
the economy. Economies for which (/ — C) has nonnegative entries are said to be productive. Such economies are 

desirable because demand can always be met by some level of production. The following theorem, whose proof can be 
found in many books on economics, gives conditions under which open economies are productive. 


THEOREM 1.9.1 

If C is the consumption matrix for an open economy, and if all of the column sums are less than then the matrix 
l — C is invertible, the entries of (/ — C) are nonnegative, and the economy is productive. 


The jth column sum of C represents the total dollar value of input that the jth sector requires to produce $1 of 
output, so if the jth column sum is less than 1, then the yth sector requires less than $ 1 of input to produce $ 1 of output; in 
this case we say that theyth sector is profitable. Thus, Theorem 1.9.1 states that if all product-producing sectors of an 
open economy are profitable, then the economy is productive. In the exercises we will ask you to show that an open 
economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open economy is productive if 
either all of the column sums or all of the row sums of C are less than 1. 


EXAMPLE 2 An Open Economy Whose Sectors Are All Profitable 

The column sums of the consumption matrix C in 1 are less than 1, so (/ — C) -1 exists and has nonnegative 
entries. Use a calculating utility to confirm this, and use this inverse to solve Equation 4 in Example 1. 


We leave it for you to show that 




2.65823 

1.89873 

1.39241 


1.13924 1.01266 
3.67089 2.15190 
2.02532 2.91139 


This matrix has nonnegative entries, and 


' 2.65823 1.13924 1 . 01266 ' 

" 7900 ' 


' 27 , 500 ' 

1.89873 3.67089 2.15190 

3950 


33,750 

1.39241 2.02532 2.91139 

1975 


24,750 


which is consistent with the solution in Example 1. 


Concept Review 

Sectors 
• Inputs 
Outputs 

Input-output analysis 
Open sector 

Economies: open, closed 










Consumption (technology) matrix 
Consumption vector 
Outside demand vector 
Production vector 
Intermediate demand vector 
Leontief matrix 
Leontief equation 

Skills 

Construct a consumption matrix for an economy. 

Understand the relationships among the vectors of a sector of an economy: consumption, outside demand, 
production, and intermediate demand. 


Exercise Set 1.9 

1. An automobile mechanic (M) and a body shop ( B ) use each other’s services. For each $1.00 of business that M does, it 
uses $0.50 of its own services and $0.25 of B 's services, and for each $1.00 of business that B does it uses $0.10 of its 
own services and $0.25 of M s services. 

(a) Construct a consumption matrix for this economy. 

(b) How much must M and B each produce to provide customers with $7000 worth of mechanical work and $14,000 
worth of body work? 


Answer: 

(a) 

0.50 0.25' 


0.25 0.10 

(b) 

' S 25, 290] 


$ 22, 581 


2. A simple economy produces food (F) and housing ( H ). The production of $1.00 worth of food requires $0.30 worth of 
food and $0. 10 worth of housing, and the production of $1.00 worth of housing requires $0.20 worth of food and 
$0.60 worth of housing. 

(a) Construct a consumption matrix for this economy. 

(b) What dollar value of food and housing must be produced for the economy to provide consumers $130,000 worth 
of food and $130,000 worth of housing? 

3. Consider the open economy described by the accompanying table, where the input is in dollars needed for $1.00 of 
output. 

(a) Find the consumption matrix for the economy. 

(b) Suppose that the open sector has a demand for $1930 worth of housing, $3860 worth of food, and $5790 worth of 
utilities. Use row reduction to find a production vector that will meet this demand exactly. 

Table Ex-3 

Income Required per Dollar Output 









Housing 

Food 

Utilities 


Housing 

$0.10 

$0.60 

$0.40 

Provider 

Food 

$0.30 

$0.20 

$0.30 


Utilities 

$0.40 

$0.10 

$0.20 


Answer: 


(a) 0.1 0.6 0.4 
0.3 0.2 0.3 
0.4 0.1 0.2 

(b) f S 31, 500' 

S 26,500 
$ 26,300 

4. A company produces Web design, software, and networking services. View the company as an open economy 
described by the accompanying table, where input is in dollars needed for $1.00 of output. 

(a) Find the consumption matrix for the company. 

(b) Suppose that the customers (the open sector) have a demand for $5400 worth of Web design, $2700 worth of 
software, and $900 worth of networking. Use row reduction to find a production vector that will meet this demand 
exactly. 


Table Ex-4 




Income Required per Dollar Output 

Web Design 

Software 

Networking 


Web Design 

$0.40 

$0.20 

$0.45 

Provider 

Software 

$0.30 

$0.35 

$0.30 


Networking 

$0.15 

$0.10 

$0.20 


In Exercises 5-6, use matrix inversion to find the production vector x that meets the demand d for the consumption 
matrix C. 




Answer: 


6 . 


123.08 

202.56 

0.3 


C = 


0.3 



7. Consider an open economy with consumption matrix 


1 
































(a) Showthat the economy can meet a demand of d\ = 2 units from the first sector and = 0 units from the second 
sector, but it cannot meet a demand of d \ — 2 units from the first sector and ^ 2=1 unit from the second sector. 

(b) Give both a mathematical and an economic explanation of the result in part (a). 


8. Consider an open economy with consumption matrix 


C = 


1 

2 

1 

2 

1 

2 


1 

4 

1 

8 

I 

4 


I 

4 

I 

4 

I 

8 


If the open sector demands the same dollar value from each product-producing sector, which such sector must 
produce the greatest dollar value to meet the demand? 


9. Consider an open economy with consumption matrix 


C = 


cil 

C 21 


H2 

0 


Show that the Leontief equation x — Cx — d has a unique solution for every demand vector d if C2[C\2 < 1 — c\ \ ■ 

1®' (a) Consider an open economy with a consumption matrix C whose column sums are less than 1, and let x be the 

production vector that satisfies an outside demand d; that is, (/ — C) _1 d = x. Let d^ be the demand vector that is 

obtained by increasing the /th entry of d by 1 and leaving the other entries fixed. Prove that the production vector 
x j that meets this demand is 

xj = x 4- yth column vector of (I — C ) -1 

(b) In words, what is the economic significance of the /th column vector of (/ — C) _1 ? [Hint: Look at x j — x .] 


11. Prove: If C is an ny.n matrix whose entries are nonnegative and whose row sums are less than 1, then / — Q is 
invertible and has nonnegative entries. [Hint: [_ 4 J j = J for any invertible matrix^.] 


True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) Sectors of an economy that produce outputs are called open sectors. 

Answer: 

False 

(b) A closed economy is an economy that has no open sectors. 

Answer: 

True 

(c) The rows of a consumption matrix represent the outputs in a sector of an economy. 
Answer: 


False 






(d) If the column sums of the consumption matrix are all less than 1, then the Leontif matrix is invertible. 
Answer: 

True 

(e) The Leontif equation relates the production vector for an economy to the outside demand vector. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Supplementary Exercises 


In Exercises 1—4 the given matrix represents an augmented matrix for a linear system. Write the 
corresponding set of linear equations for the system, and use Gaussian elimination to solve the linear system. 
Introduce free parameters as necessary. 


1 . 


3-104 1 

2 033-1 


Answer: 


3xi - x 2 


2 . 


3. 


2xi 

+ 3 
3 3, 

xi = 

2 S ~2 t ~ 

1 

4 -f 

-2 

Csl 

00 

1 

3 

12 -3 

0 

0 0 

“ 


2 

-4 1 


-4 

0 


0 

1 -1 


4 = *4 = 1 
+ 3x4 = 

*2=—f s - 


* 


5 

2 ’ 


6 

3 -1 


Answer: 


2xi — 4 x 2 + X3 = 6 
—4xi + 3x3 = — 1 
x 2 - x 3 = 3 


4. 


3 1 -2 

-9 -3 6 

6 2 1 


x 2= - 


26 


x 3 = - 


35 


5. Use Gauss-Jordan elimination to solve for x' and y' in terms of x andy. 

x = lx' - V 

* 5* 5 7 

y=!*'-§/ 


Answer: 


J _ 3 , 4 

x — 


/=-§*+ 


6 . Use Gauss-Jordan elimination to solve forx' andy' in terms ofx andy. 










x — x'cos 9 — y 'sin 9 
y = x'svci9 — y'cos 9 

7. Find positive integers that satisfy 

x + 7 + z = 9 
x + 5y + 1 Oz = 44 

Answer: 

x=4, y = 2, z = 3 

8 . A box containing pennies, nickels, and dimes has 13 coins with a total value of 83 cents. How many coins 
of each type are in the box? 

9. Let 

a 0 b 2 
a a 4 4 
0 a 2 b 

be the augmented matrix for a linear system. Find for what values of a and b the system has 

(a) a unique solution. 

(b) a one-parameter solution. 

(c) a two-parameter solution. 

(d) no solution. 

Answer: 

( a ) a * 0, b * 2 

(b) a*0, b = 2 

(c) a = 0, b = 2 

(d) a = C) > 

10. For which value(s) of a does the following system have zero solutions? One solution? Infinitely many 
solutions? 

xi +X2 + *3 = 4 

*3 = 2 

(a^ — 4 J *3 = a — 2 

11. Find a matrix K such that AKB = C given that 



8 6 - 6 ' 
C = 6-1 1 , 

-4 0 0 



Answer: 


K = 


0 2 
1 1 


12. How should the coefficients a, b, and c be chosen so that the system 

ax + by — 3z = — 3 
— 2x — by+cz = — 1 
ax + 3y — cz = — 3 


has the solution * = 1, y = _ 1, and z = 2? 


13. In each part, solve the matrix equation for A’. 


(a) -1 0 1 

X 1 1 0 

3 -1 


1 2 0 
-3 1 5 


(b) 


X 


1 -1 2 

3 0 1 


-5 -1 0 
6-3 7 


(c) 


3 1 

-1 2 


\X — X\ 


1 4 

2 0 


2 

5 


-2 

4 


Answer: 


(c) 

X = 


-1 3 -1 



6 0 

1J 

1 

-2' 


3 

1 _ 



113 

160 


' 37 

37 


20 

46 


37 

37 


14. Let A be a square matrix. 

(a) Show that {I — A )=/ + A + A 2 I A 1 'ifA 4 = 0- 

(b) Show that 

(/ - A)~ l = I + A + A 2 +... + A n 

if^ M+1 = 0- 


'y 

15. Find values of a, b, and c such that the graph of the polynomial p(x) = ax + bx + c passes through the 
points (1, 2), (-1, 6), and (2, 3). 


Answer: 

a = 1, b = — 2, c = 3 

16. (Calculus required) Find values of a, b. and c such that the graph of the polynomial 



9 

p(x) = ax +bx + c passes through the point (-1, 0) and has a horizontal tangent at (2, -9). 

17. Let J n be the nxn matrix each of whose entries is 1 . Show that if n > 1 , then 

ft — 1 

18. Show that if a square matrix A satisfies 

+ 4A 2 — 2A + 7/ = 0 

then so does A 7 '- 

19. Prove: If B is invertible, then AB~^ = B~^A if an d only if AB = BA- 

20 . Prove: If A is invertible, then A ■+■ B and / | BA~^ are both invertible or both not invertible. 



23. (Calculus required) Use part (c) of Exercise 22 to show that 

dA_ _ — A —^ ——~A —^ 

dx dx 

State all the assumptions you make in obtaining this formula. 

24. Assuming that the stated inverses exist, prove the following equalities. 



-1 


( a ) (c -1 + D -1 ) _1 =C(C + £)) _1 D 

(b) (l + CD) -1 C = C(1 + DC) _1 

(c) (C I DD T y l D = C~ l Dl i ! + D T C~ 1 D\ 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 
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Determinants 


CHAPTER CONTENTS 

Determinants by Cofactor Expansion 
Evaluating Determinants by Row Reduction 
Properties of Determinants; Cramer's Rule 


INTRODUCTION 

In this chapter we will study “determinants” or, more precisely, “determinant functions.” 
Unlike real-valued functions, such as / (x) = x , that assign a real number to a real 

variable x, determinant functions assign a real number / (^4) to a matrix variable A. 
Although determinants first arose in the context of solving systems of linear equations, 
they are no longer used for that purpose in real-world applications. Although they can be 
useful for solving very small linear systems (say two or three unknowns), our main 
interest in them stems from the fact that they link together various concepts in linear 
algebra and provide a useful formula for the inverse of a matrix. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


2.1 Determinants by Cofactor Expansion 

In this section we will define the notion of a “determinant.” This will enable us to give a specific formula for the inverse of an 
invertible matrix, whereas up to now we have had only a computational procedure for finding it. This, in turn, will eventually 
provide us with a formula for solutions of certain kinds of linear systems. 

Recall from Theorem 1.4.5 that the 2x2 matrix 


WARNING 


It is important to keep in mind that det(^4) is a number , 
whereas A is a matrix. 


is invertible if and only if ad — be =£ 0 and that the expression ad — be is called the determinant of the matrix A. Recall also 
that this determinant is denoted by writing 


det(^4) = ad — be or 


a 

c 


b 

d 


= ad — be 


( 1 ) 


and that the inverse of A can be expressed in terms of the determinant as 

a-\ _ 1 d —b 

det(il) [-c * 


( 2 ) 


Minors and Cofactors 


One of our main goals in this chapter is to obtain an analog of Formula 2 that is applicable to square matrices of all orders. For 
this purpose we will find it convenient to use subscripted entries when writing matrices or determinants. Thus, if we denote a 
2 x2 matrix as 


*11 *12 
*21 *22 


then the two equations in 1 take the form 


det (A) = 


*11 

*21 


*12 

*22 


= * 11*22 “* 12*21 


(3) 


We define the determinant of a 1 x 1 matrix A= [«n] 
as det [A] = det [a\\] =a\\ 


The following definition will be key to our goal of extending the definition of a determinant to higher order matrices. 

r n 


DEFINITION 1 

If A is a square matrix, then the minor of entry a ij is denoted by M y and is defined to be the determinant of the 
submatrix that remains after the zth row andyth column are deleted from A. The number ( — 1) 2 +*? My is denoted by 
Cy and is called the cofactor of entry a ij. 













L 


J 


EXAMPLE 1 Finding Minors and Cofactors 


Let 


A = 


3 1 -4 
2 5 6 

1 4 8 


WARNING 


We have followed the standard convention of 
using capital letters to denote minors and cofactors 
even though they are numbers, not matrices. 


The minor of entry ct \ i is 


A/ll = 


3 

1 —4 

: 

5 6 

1 

4 8 


5 6 
4 8 


= 16 


The cofactor of a \ i is 
Similarly, the minor of entry ^32 is 


Cn = (- l) 1+1 Mn = Mu = 16 


A/32 = 


3 

1 

-4 

2 

5 

6 

_ 

k 

v 

T 


0 


3 —4 
2 6 


= 26 


The cofactor of «32 is 


C 32 = ( - 1 ) 3 + 2 M 32 = - M 32 = - 26 


The term determinant was first introduced by the German mathematician Carl Friedrich 
Gauss in 1801 (see p. 15), who used them to “determine” properties of certain kinds of functions. 
Interestingly, the term matrix is derived from a Latin word for “womb” because it was viewed as a container 
of determinants. 


The term minor is apparently due to the English mathematician James Sylvester (see p. 
34), who wrote the following in a paper published in 1850: “Now conceive any one line and any one column 
be struck out, we get... a square, one term less in breadth and depth than the original square; and by varying 
in every possible selection of the line and column excluded, we obtain, supposing the original square to 
consist of n lines and n columns, ^ such minor squares, each of which will represent what I term a “First 
Minor Determinant” relative to the principal or complete determinant.” 


Note that a minor and its corresponding cofactor Cy are either the same or negatives of each other and that the 












relating sign ( — 1) 3 3 is either -4= 1 or — ] in accordance with the pattern in the “checkerboard” array 


+ 

+ 


+ 


— 4" 


- + ... 

+ - ... 

— -h ... 

4= — ... 


For example, 


C\\=M\\, C21 = — M21, C22 = &f22 

and so forth. Thus, it is never really necessary to calculate ( — l) J + ? to calculate Cy —you can simply compute the minor M 
and then adjust the sign in accordance with the checkerboard pattern. Try this in Example 1. 


EXAMPLE 2 Cofactor Expansions of a 2 x2 Matrix 


The checkerboard pattern for a 2 x 2 matrix A = [ay ] is 

4- — 

- + 

so that 

Cn = Mii=fl 2 2 Ci2= -M 12 = -021 

C21 = — M21 = —012 C22 = M22 = a \\ 


We leave it for you to use Formula 3 to verify that det(44) can be expressed in terms of cofactors in the following 
four ways: 


det(A) 


0 11 012 
*21 022 


= 01lCn +012^12 
= 0 2 iC 2 i -h022^22 
= 01lCn 4021^21 
= 012^12 + 022^22 


Each of last four equations is called a cofactor expansion of det[^4]. In each cofactor expansion the entries and 
cofactors all come from the same row or same column of A. For example, in the first equation the entries and 
cofactors all come from the first row of A, in the second they all come from the second row of A, in the third they all 
come from the first column of A , and in the fourth they all come from the second column of A. 


Definition of a General Determinant 

Formula 4 is a special case of the following general result, which we will state without proof. 


THEOREM 2.1.1 

If A is an n x n matrix, then regardless of which row or column of A is chosen, the number obtained by multiplying the 
entries in that row or column by the corresponding cofactors and adding the resulting products is always the same. 






This result allows us to make the following definition. 


1 


DEFINITION 2 

If A is an n x n matrix, then the number obtained by multiplying the entries in any row or column of A by the 
corresponding cofactors and adding the resulting products is called the determinant of A, and the sums themselves are 
called cofactor expansions of A. That is, 

det(A) =a\jC\j -\-ct2jC2j + --- + a n jC n j 

[cofactor expansion along the /th column] 

and 

det(^) = flfiCji + a i2^i2 + -•• + a ir£'in 

(f\ 

[cofactor expansion along the ith row] 


EXAMPLE 3 Cofactor Expansion Along the First Row 

Find the determinant of the matrix 

A = 


by cofactor expansion along the first row. 

Solution 


det(,4) = 


3 1 

-2 -4 


4 - 


3 1 

-2 -4 


0 
3 

4 -2 


= 3 


-4 3 

4 -2 


- 1 


-2 3 

5 -2 


-2 -4 
5 4 


= 3(—4) — (1)( — 11)4-0 = —1 


EXAMPLE 4 Cofactor Expansion Along the First Column 

Let A be the matrix in Example 3, and evaluate det(^4) by cofactor expansion along the first column of A. 

Solution 


det(.d) = 


3 1 0 

-2 -4 3 
5 4-2 


= 3 


-4 3 

4 -2 


-(- 2 ) 


1 0 
4 -2 


+ 5 


1 0 
-4 3 


= 3( — 4) — ( — 2)( — 2) 4- 5(3) = — 1 


Note that in Example 4 we had to compute three 
cofactors, whereas in Example 3 only two were 
needed because the third was multiplied by zero. 
As a rule, the best strategy for cofactor 
expansion is to expand along a row or column 
with the most zeros. 




















This agrees with the result obtained in Example 3. 



Charles Lutwidge Dodgson (Lewis Carroll) (1832-1898) 

Cofactor expansion is not the only method for expressing the determinant of a matrix 
in terms of determinants of lower order. For example, although it is not well known, the English 
mathematician Charles Dodgson, who was the author of Alice's Adventures in Wonderland and Through 
the Looking Glass under the pen name of Lewis Carroll, invented such a method, called “ condensation 
That method has recently been resurrected from obscurity because of its suitability for parallel 
processing on computers. 

[Image: Time & Life Pictures/Getty Images, Inc.] 


EXAMPLES Smart Choice of Row or Column 


If A is the 4x4 matrix 



0 0-1 
1 2 2 

0 -2 1 

0 0 1 


then to find det(j4) it will be easiest to use cofactor expansion along the second column, since it has the most zeros: 


det(^) = 1 


1 0 -1 

1 -2 1 

2 0 1 


For the 3 x 3 determinant, it will be easiest to use cofactor expansion along its second column, since it has the most 
zeros: 


det(j4) = 1 • - 2 • ‘ 

= - 2 ( 1 + 2 ) 

= -6 


EXAMPLE 6 Determinant of an Upper Triangular Matrix 








The following computation shows that the determinant of a 4 x 4 upper triangular matrix is the product of its 
diagonal entries. Each part of the computation uses a cofactor expansion along the first row. 


311 

0 

0 

0 

321 

322 

0 

0 

331 

332 

333 

0 

341 

342 

343 

344 


a 22 ^ 


= a n 


a 22 

342 


a 23 

43 


0 

0 

344 


= a ll a 22 


<233 
a 42 


0 

a 44 


= 3 11^22*331«44| = a 11^22^33^44 


The method illustrated in Example 6 can be easily adapted to prove the following general result. 


THEOREM 2.1.2 

If .4 is an « x n triangular matrix (upper triangular, lower triangular, or diagonal), then det( A) is the product of the 
entries on the main diagonal of the matrix; that is, det(^4) = <*n<*22 * ' ' a nn- 


A Useful Technique for Evaluating 2x2 and 3^3 Determinants 

Determinants of 2 x 2 and 3x3 matrices can be evaluated very efficiently using the pattern suggested in Figure 2.1.1. 


r 

012 1 

0U 

<Y 

011 

021 


021 

t / 2 2 


L- 

J 

.031 


0-33 


f*U 012 
(t${ (122 

0*1 0*2 


Figure 2.1.1 

In the 2x2 case, the determinant can be computed by forming the product of the entries on the rightward arrow and 
subtracting the product of the entries on the leftward arrow. In the 3 x 3 case we first recopy the first and second columns as 
shown in the figure, after which we can compute the determinant by summing the products of the entries on the rightward 
arrows and subtracting the products on the leftward arrows. These procedures execute the computations 

WARNING 


The arrow technique only works for determinants of 
2 x 2 and 3 x 3 matrices. 


<*11 *12 
321 a 22 


= <* 11322 12321 


311 

312 

313 


321 

322 

323 

= 011 

331 

332 

333 



322 

332 


323 

333 


-312 


321 323 
331 333 


+ 313 


321 322 
331 <*32 


= 311 (322333 ~ 323332) - 312(321333 - <*2333l) + 313(321332 - <*22 a 3l) 

= <* 11<*22333 +<*12323331 + <* 13<*21332 — <* 13<*223 31 — a\2^21 a 23 ~ a U a 23 a 22 




















which agrees with the cofactor expansions along the first row. 


EXAMPLE 7 A Technique for Evaluating 2x2 and 3x3 Determinants 



(3)(—2) — (1 )(4) = -10 



= [45 + 84 + 96| -1105 - 48 - 72] = 240 


Concept Review 

Determinant 

Minor 

Cofactor 

Cofactor expansion 

Skills 

Find the minors and cofactors of a square matrix. 

Use cofactor expansion to evaluate the determinant of a square matrix. 

Use the arrow technique to evaluate the determinant ofa2x2 or 3x3 matrix. 

Use the determinant of a 2 x 2 invertible matrix to find the inverse of that matrix. 

Find the determinant of an upper triangular, lower triangular, or diagonal matrix by inspection. 


Exercise Set 2.1 


In Exercises 1-2, find all the minors and cofactors of the matrix^. 



Answer: 












M 1 ! = 29, C n = 29 
M\2 = 21, C\2 = — 21 
Mi 3 = 27, Ci 3 = 27 
M 2 i= -11, C 2 i = 11 
M 22 = 13, C 22 = 13 
M 2 3 = — 5, C 2 3 = 5 
M 3 i= -19, C 3 1 = -19 
^32 = “ 19, C 3 2 = 19 
il /33 = 19, C 33 = 19 

2 . ri 1 2 ' 

A= 3 3 6 
0 1 4 

3. Let 


Find 

(a) ^13 andCi 3 

(b) M 23 and C 23 

(c) -^22 and C 2 2 

(d) -^21 andC 21 

Answer: 

(a) Mi 3 = 0 , Ci 3 = 0 

(b) -^23 = — 96, C 2 3 = 96 

(c) A^ 22 = —48, C 22 = —48 

(d) M 21 =72, C 21 = -72 

4. Let 


Find 

(a) M 32 andC 32 . 

(b) M 44 and C 44 . 
( C ) M 41 and C 41 
(d) M 24 and C 24 


4-1 16 

0 0-33 

4 1 0 14 
4 13 2 



3 -1 1 
2 0 3 

-2 1 0 
-2 1 4 


In Exercises 5-8, evaluate the determinant of the given matrix. If the matrix is invertible, use Equation 2 to find its inverse. 


5 . 


3 5 
-2 4 


Answer: 



22 ; 


'_ 2 _ _Jl" 
11 22 

_L J_ 

11 22 _ 


6 . [4 r 


_8 2 


7.f“5 

7 

1 - 

1 

<1 

-2 


Answer: 


2_ _J_1 


59; 5 ? 9 

59 

5 

59 

59 

/2 


4 /3 



In Exercises 9-14, use the arrow technique to evaluate the determinant of the given matrix. 

9. \a - 3 5 

-3 a- 2_ 

Answer: 

a 2 -5a+ 21 

10. [-2 7 6" 

5 1 -2 
3 8 4 

11. f—2 1 4' 

3 5-7 
6 2 _ 

Answer: 

-65 

12. r—i i 2' 

3 0-5 
1 7 2 

13. [3 0 O' 

2-1 5 

1 9 -4 

Answer: 

-123 

14. T tr -4 3~ 

2 1 c 2 

_4 c-1 2 


In Exercises 15-18, find all values of X for which det(^4) = 0. 



15. h_ A — 2 1 

-5 A + 4 


Answer: 



A = 1 or “ 

3 

16. 

TA-4 

0 0 


O 

II 

A 2 


L o 

3 A- 

17. 

h, 

II 

■ 1 

0 


L 2 

A 4-1 


Answer: 



II 

o 

•-* 

1 

1 

18. 

A —4 

4 0 


A= -1 

A 0 


0 

1 

< 

O 


19. Evaluate the determinant of the matrix in Exercise 13 by a cofactor expansion along 

(a) the first row. 

(b) the first column. 

(c) the second row. 

(d) the second column. 

(e) the third row. 

(f) the third column. 

Answer: 

(all parts) — 123 

20. Evaluate the determinant of the matrix in Exercise 12 by a cofactor expansion along 

(a) the first row. 

(b) the first column. 

(c) the second row. 

(d) the second column. 

(e) the third row. 

(f) the third column. 

In Exercises 21-26, evaluate det(^4) by a cofactor expansion along a row or column of your choice. 

21. [-3 0 T 

A= 2 5 1 

= 1 0 5_ 

Answer: 

“40 

22 . [3 3 f 

A= \ 0 “4 

1 -3 5 



23. 


1 k 

A = \ k k 2 

1 k 

Answer: 

0 

24. pt+1 *-l T 

A= 2 k-3 4 

5 fc +1 k 

25. [3 3 0 5' 

2 2 0 -2 

4 1-30 

2 10 3 2 

Answer: 

-240 


26. 

4 

0 

0 

1 

0 


3 

3 

3 

-1 

0 

A = 

1 

2 

4 

2 

3 


9 

4 

6 

2 

3 


2 

2 

4 

2 

3 


In Exercises 27-32, evaluate the determinant of the given matrix by inspection. 

27. [ 1 0 O' 

0-10 
0 0 1 

Answer: 

-1 

28. [2 0 O' 

0 2 0 
_° 0 2 _ 

29. [0 0 0 O’ 

12 0 0 
0 4 3 0 
12 3 8 

Answer: 

0 

30. [i i i r 

0 2 2 2 
0 0 3 3 
0 0 0 4 

'12 7 -3' 

0 1-4 1 

0 0 2 7 

0 0 0 3 


31 . 



Answer: 


6 


-3 

0 

0 

0 

1 

2 

0 

0 

40 

10 

-1 

0 

100 

200 

-23 

3 


33. Show that the value of the following determinant is independent of 0. 

sin(0) cos(0) 

—cos(0) sin(0) 

sin(0) — cos(0) sin(0) -H cos(0) 


Answer: 


The determinant is s in 2 0 + cos 2 0 = 1 • 

34. Show that the matrices 


commute if and only if 



and B = 



b a —c 

e d-f 


= 0 


35. By inspection, what is the relationship between the following determinants? 



a 

b 

c 

a A- A 

b 

c 

^1 = 

d 

1 

/ and d 2 = 

d 

1 

/ 


g 

0 

1 

g 

0 

1 


Answer: 

d2 = d\ + A 

36. Show that 

tr (A) 1 

\x(A 2 ) tr(j4) 

for every 2x2 matrix A. 

37. What can you say about an wth-order determinant all of whose entries are 1? Explain your reasoning. 

38. What is the maximum number of zeros that a 3 x 3 matrix can have without having a zero determinant? Explain your 
reasoning. 

39. What is the maximum number of zeros that a 4 x 4 matrix can have without having a zero determinant? Explain your 
reasoning. 

40. Prove that (x\, y\), (*2> 72)’ anc ^ (*3> 73) are c °Umear points if and only if 

*i y\ 1 
*2 72 1 =0 
x 2 73 1 

41. Prove that the equation of the line through the distinct points (a\,b\) and (^ 2 , ^ 2 ) can be written as 




X 


y 

ai b\ 

<*2 k >2 


1 

1 

1 


= 0 


42. Prove that if .4 is upper triangular and Bjj is the matrix that results when the /th row and /th column of A are deleted, then 
Bjj is upper triangular if i < j. 

True-False Exercises 


In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 


(a) 

v ’ The determinant of the 2x2 matrix 


a b 
c d 


is ad + 


Answer: 

False 

(b) Two square matrices A and B can have the same determinant only if they are the same size. 
Answer: 

False 

(c) The minor is the same as the cofactor if and only if j -F j is even. 

Answer: 

True 

(d) If A is a 3 x 3 symmetric matrix, then Cjj - Cji for all i and j. 

Answer: 


True 

(e) The value of a cofactor expansion of a matrix A is independent of the row or column chosen for the expansion. 
Answer: 


True 

(f) The determinant of a lower triangular matrix is the sum of the entries along its main diagonal. 
Answer: 

False 

(g) For every square matrix A and every scalar c, we have det(c^) = c det(^4). 

Answer: 

False 

(h) For all square matrices A and B , we have det(^4 4- B) = det(.d) + det(5). 

Answer: 


False 






(i) For every 2x2 matrix^, we have det (A ) = (det(^4)) 

Answer: 

True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



2.2 Evaluating Determinants by Row Reduction 

In this section we will show how to evaluate a determinant by reducing the associated matrix to row echelon form. In 
general, this method requires less computation than cofactor expansion and hence is the method of choice for large 
matrices. 


A Basic Theorem 

We begin with a fundamental theorem that will lead us to an efficient procedure for evaluating the determinant of a square 
matrix of any size. 


THEOREM 2.2.1 

Let A be a square matrix. If A has a row of zeros or a column of zeros, then det(2d) = 0. 


Since the determinant of A can be found by a cofactor expansion along any row or column, we can use the row or 
column of zeros. Thus, if we let Cj, C 2 , C }} denote the cofactors of A along that row or column, then it follows from 
Formula 5 or 6 in Section 2.1 that 


det(A) = 0 ■ Ci + 0 • C 2 + ... + 0 • C n = 0 


The following useful theorem relates the determinant of a matrix and the determinant of its transpose. 


THEOREM 2.2.2 

T 

Let A be a square matrix. Then det(^4) = det(L4 ). 


Because transposing a matrix changes its columns to 
rows and its rows to columns, almost every theorem 
about the rows of a determinant has a companion 
version about columns, and vice versa. 


Since transposing a matrix changes its columns to rows and its rows to columns, the cofactor expansion of A 

T 

along any row is the same as the cofactor expansion of A along the corresponding column. Thus, both have the same 
determinant. 


Elementary Row Operations 


The next theorem shows how an elementary row operation on a square matrix affects the value of its determinant. In 


place of a formal proof we have provided a table to illustrate the ideas in the 3 x 3 case (see Table 1). 


THEOREM 2.2.3 

Let A be an ^ x n matrix. 

(a) If B is the matrix that results when a single row or single column of A is multiplied by a scalar k , then 
det (5) =k det(^4). 

(b) If B is the matrix that results when two rows or two columns of A are interchanged, then det (5) = — det(^4). 

(c) If B is the matrix that results when a multiple of one row of A is added to another row or when a multiple of 
one column is added to another column, then det (5) = det(^4). 


The first panel of Table 1 shows that you can bring a 
common factor from any row (column) of a 
determinant through the determinant sign. This is a 
slightly different way of thinking about part (a) of 
Theorem 2.2.3. 


Table 1 


Relationship 

Operation 

ka 11 ko j 2 ka | ^ 

#21 #22 <*23 = k 

<*3\ <*32 <*33 

det (B) — Adetl 

«ll «I2 0|3 

a a a n a n 

<*31 <*32 ^33 

[4) 

The first row of A is 
multiplied by A. 

<*2\ <*22 <*23 <*\\ a \2 <*\3 

#H <*\2 <*\3 — — <*2\ <*22 <*23 

a M a 32 a 33 a 3l a 32 a 33 

det (B) = -det(/t) 

The first and second rows 
of A are interchanged. 


# | j “1“ k(J ■jj <* 12 d” k(I ->2 <* [3 d~ kll -j ^ 
<*2\ <*22 <*23 

<*3\ <*32 <*33 

det(tf) = det( 

4) 

#11 <*\2 <*13 

<*2\ <*22 <*23 

<*3i <*32 <*33 


A multiple of the second 
row of A is added to the 
first row'. 


We will verify the first equation in Table 1 and leave the other two for you. To start, note that the determinants on the two 
sides of the equation differ only in the first row, so these determinants have the same cofactors, C\\,C\ 2 ,Cy$, along that 
row (since those cofactors depend only on the entries in the second two rows). Thus, expanding the left side by cofactors 
along the first row yields 



















ka n 

ka 12 

&3\ 3 

a 21 

^22 

323 

a 3l 

332 

333 


= foj 11C11 4- ka\2C\2 + &Z33C13 


= *(ailCn +<ati2Ci2 + <233Ci 3 ) 


a 11 

3 12 

3 13 

a 2\ 

322 

<323 

a 2\ 

332 

3 33 


Elementary Matrices 

It will be useful to consider the special case of Theorem 2.2.3 in which A = I n is the n x n identity matrix and E (rather 
than B ) denotes the elementary matrix that results when the row operation is performed on l n . In this special case 
Theorem 2.2.3 implies the following result. 


THEOREM 2.2.4 

Let E be an n x n elementary matrix. 

(a) If E results from multiplying a row of by a nonzero number k , then det^) = k. 

(b) If E results from interchanging two rows of then det(£) = — 1. 

(c) If E results from adding a multiple of one row of to another, then det(£) = 1. 


EXAMPLE 1 Determinants of Elementary Matrices 

The following determinants of elementary matrices, which are evaluated by inspection, illustrate Theorem 
2.2.4. 


Observe that the determinant of an elementary 
matrix cannot be zero. 


1 

0 

0 

0 


0 

0 

0 

1 


1 

0 

0 

7 

0 

3 

0 

0 

= 3, 

0 

1 

0 

0 

= — 1, 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 


1 

0 

0 

0 


0 

0 

0 

1 


The second row of 1 4 The first and last iws of 7 times the last row of 1 4 


was multiplied by 3 T4 interchanged. was added to the first row. 


Matrices with Proportional Rows or Columns 


If a square matrix A has two proportional rows, then a row of zeros can be introduced by adding a suitable multiple of one 












of the rows to the other. Similarly for columns. But adding a multiple of one row or column to another does not change 
the determinant, so from Theorem 2.2.1, we must have det(^4) = 0. This proves the following theorem. 


THEOREM 2.2.5 

If A is a square matrix with two proportional rows or two proportional columns, then det(^4) = 0. 


EXAMPLE 2 Introducing Zero Rows 


The following computation shows how to introduce a row of zeros when there are two proportional rows. 


1 

3 

-2 

4 


1 

3 

-2 

4 

2 

6 

-4 

8 


0 

0 

0 

0 

3 

9 

1 

5 


3 

9 

1 

5 

1 

1 

4 

8 


1 

1 

4 

8 


The second row is 2 times the 
first, so we added =2 times 

the first row to the second to 
introduce a row of zeros . 


Each of the following matrices has two proportional rows or columns; thus, each has a determinant of zero. 


-1 4 
-2 8 


1 

-4 

2 


-2 

8 

-4 


7 

5 

3 


3-1 4-5 

6-2 5 2 

5 8 14 

-9 3 -12 15 


Evaluating Determinants by Row Reduction 


We will now give a method for evaluating determinants that involves substantially less computation than cofactor 
expansion. The idea of the method is to reduce the given matrix to upper triangular form by elementary row operations, 
then compute the determinant of the upper triangular matrix (an easy computation), and then relate that determinant to 
that of the original matrix. Here is an example. 


EXAMPLE 3 Using Row Reduction to Evaluate a Determinant 


Evaluate det(^4) where 



1 5 
-6 9 
6 1 


We will reduce A to row echelon form (which is upper triangular) and then apply Theorem 

2 . 1 . 2 . 


Even with today’s fastest computers it would 
take millions of years to calculate a 25 x 25 
determinant by cofactor expansion, so 














methods based on row reduction are often 
used for large determinants. For determinants 
of small size (such as those in this text), 
cofactor expansion is often a reasonable 
choice. 


det(;l) = 


0 1 5 

3-6 9 
2 6 1 


3-6 9 
0 1 5 
2 6 1 




-2 3 
1 5 
6 1 


= -3 


1 -2 3 

0 1 5 

0 10 -5 


= -3 


1 -2 3 

0 1 5 

0 0 -55 


= (-3)(-55) 


1 -2 3 
0 1 5 

0 0 1 


= (—3) (—55) (1) = 165 


The first and second rows of 
A where interchanged. 

A common factor of 3 from 
«= the first row was taken 

through the determinant sign. 

=2 times the first row was 
added to the third row. 


= =10 times the second row 
was added to the third row. 

A common factor of = 55 
«— from the last row was taken 
through the determinant sign. 


EXAMPLE 4 Using Column Operations to Evaluate a Determinant 


Compute the determinant of 


10 0 3 

2 7 0 6 

0 6 3 0 

7 3 1-5 


This determinant could be computed as above by using elementary row operations to reduce A to 
row echelon form, but we can put A in lower triangular form in one step by adding -3 times the first column 
to the fourth to obtain 


det(^4) = det 


1 0 0 
2 7 0 
0 6 3 
7 3 1 


0 

0 

0 


-26 


= (1) (7) (3) (— 26) = — 546 


Example 4 points out that it is always wise to keep 
an eye open for column operations that can shorten 


















computations. 


Cofactor expansion and row or column operations can sometimes be used in combination to provide an effective method 
for evaluating determinants. The following example illustrates this idea. 

EXAMPLE 5 Row Operations and Cofactor Expansion 


Evaluate det(yl) where 


A = 


3 5 
1 2 

2 4 

3 7 


-2 6 

-1 1 
1 5 
5 3 


By adding suitable multiples of the second row to the remaining rows, we obtain 


det(A) 


0-1 13 

1 2-11 

0 0 3 3 

0 18 0 

-1 1 3 

- 0 3 3 
1 8 0 

-1 1 3 

- 0 3 3 
0 9 3 


= -18 


«— Cofactor expansion along the first column. 


4— We added the first row to the third row. 


o— Cofactor expansion along the first column. 


Skills 

Know the effect of elementary row operations on the value of a determinant. 

Know the determinants of the three types of elementary matrices. 

Know how to introduce zeros into the rows or columns of a matrix to facilitate the evaluation of its determinant. 
Use row reduction to evaluate the determinant of a matrix. 

Use column operations to evaluate the determinant of a matrix. 

Combine the use of row reduction and cofactor expansion to evaluate the determinant of a matrix. 


Exercise Set 2.2 

In Exercises 1-4, verify that det(.d) = det(.d J ). 















A= 1 2 4 

_5 -3 6_ 

4. [4 2-1' 

A= 0 2-3 

-1 1 5 

In Exercises 5-9, find the determinant of the given elementary matrix by inspection. 

5. T1 0 0 O' 

0 1 0 0 

0 0-50 
0 0 0 1 

Answer: 


-5 



1 

0 

o' 


0 

1 

0 

- 

■5 

0 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 


Answer: 

-1 

8 . 1 " 1 0 0 o’ 

0 -j 0 0 

0 0 10 
0 0 0 1 

9. [l 0 0 O’ 

0 10-9 
0 0 1 0 

0 0 0 1 

Answer: 

1 

In Exercises 10-17, evaluate the determinant of the given matrix by reducing the matrix to row echelon form. 

10. T 3 6 -9' 

0 0-2 

-2 1 5 

11. ro 3 r 

1 1 2 
3 2 4 



Answer: 


5 

12 . r -3 O' 

-2 4 1 

-2 2 _ 

13. f -6 9' 

-2 7 -2 

1 5 

Answer: 

33 

14. [1—2 3 f 

5-963 

-1 2 -6 -2 

2 8 6 1 

15. [2 1 3 f 

10 11 
0 2 10 
0 12 3 

Answer: 

6 

16. r oi i i 


-If 00 

1 7 . r i 31 

-2 -7 0 
0 1 
0 2 
0 0 

Answer: 

-2 

18. Repeat Exercises 10-13 by using a combination of row reduction and cofactor expansion. 

19. Repeat Exercises 14-17 by using a combination of row operations and cofactor expansion. 

Answer: 

Exercise 14: 39; Exercise 15: 6; Exercise 16: — -i; Exercise 17: —2 


5 3 
-4 2 
0 1 
1 1 
1 1 


In Exercises 20-27, evaluate the determinant, given that 



a b c 

d e / = _6 
g h i 


20 . g h i 
d e f 
a b c 

21 . d e f 
g h i 
a b c 

Answer: 

-6 

22. a b c 
d e / 

2 a 2b 2c 

23. 3 a 3b 3c 
—d —e —f 
4 g 4A 4i 

Answer: 

72 

24. a + b + e c + / 

-rf -e -/ 

g A i 

25. a + g b + h c + i 

d e f 

g A i 

Answer: 

—6 

26. a b c 

2d 2e 2/ 

g+3a h 4- 3b i =H 3c 

27. —3a —3b —3c 

d e f 

g — 4 d h — 4e i — 4f 

Answer: 

18 


28. Show that 



(a) 


det 


(b) 


det 


0 0 a 13 

0 aji (223 
<331 <332 «33 

0 0 0 a\4 

0 0 ct22 <324 

0 a22 <333 <334 
<341 <342 <343 <344 


= - <J13<322<331 


= <J14<323<332<341 


29. 


Use row reduction to show that 


1 

b 


= (b — a)(c — a) (c — b) 

In Exercises 30-33, confirm the identities without evaluating the determinants directly. 


J. k 2 2 
a o c 


30. 

CL\^b\t d2^rb2t <33 + 


<31 <32 <33 


a\t^~b\ a 2 t s ¥b 2 a^t^b^ 

= (i -t 2 ) 

b\ b2 63 


c\ c 2 C 2 


c\ C 2 C 2 


31. 

<31 b\ <*i +b\ -fci 


<J1 i>i ci 


<32 b2 <32 + ^2 + c 2 

= 

<32 b2 C2 


<J 3 63 <22 + b2 + C2 


<22 Z?3 C2 


32. 

a\ b\^-ta\ c\+rb\+sa\ 


<31 <32 <*3 


22 b 2 + ta 2 C 2 + ^2 + sa 2 

= 

b\ i>2 b 2 


23 &3 + &Z 3 C 2 + rb 2 ^-S 22 


ci C 2 C 3 


33. 

2\ + b\ 2\ — b\ C\ 


2\ b\ C 1 


22 + ^2 a 2 “ &2 c 2 

= -2 

<22 b2 C2 


tf3 + &3 ^3“&3 c 2 


22 b2 C2 


34. Find the determinant of the following matrix. 

a b b b 
b a b b 
b b a b 
b b b a 


In Exercises 35-36, show that det(v4) = 0 without directly evaluating the determinant. 


35. 


A = 


36. 


A = 


-2814 
3 2 5 1 
1 10 6 5 
4-64-3 

-41111 

I- 4111 

II- 411 

III - 41 
1111-4 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 





























(a) If A is a 4 x 4 matrix and B is obtained from A by interchanging the first two rows and then interchanging the last two 
rows, then det(5) = det(^4). 

Answer: 

True 

(b) If A is a 3 x 3 matrix and B is obtained from A by multiplying the first column by 4 and multiplying the third column 
by J-, then det(5) = 3 det(^4). 

Answer: 

True 

(c) If A is a 3 x 3 matrix and B is obtained from A by adding 5 times the first row to each of the second and third rows, 
then det(5) = 25 det(^4). 

Answer: 

False 

(d) If A is an n x n matrix and B is obtained from A by multiplying each row of A by its row number, then 

det(S) = det(^) 


Answer: 

False 

(e) If A is a square matrix with two identical columns, then det(^4) = 0. 

Answer: 

True 

(f) If the sum of the second and fourth row vectors of a 6 x 6 matrix A is equal to the last row vector, then det(^4) = 0. 
Answer: 

True 
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2.3 Properties of Determinants; Cramer's Rule 

In this section we will develop some fundamental properties of matrices, and we will use these results to derive a 
formula for the inverse of an invertible matrix and formulas for the solutions of certain kinds of linear systems. 


Basic Properties of Determinants 


Suppose that A and B are n x n matrices and k is any scalar. We begin by considering possible relationships 
between det(^4), det (5), and 

det(^), det(.4 + 5), and det (AB) 

Since a common factor of any row of a matrix can be moved through the determinant sign, and since each of the 
n rows in kA has a common factor of k , it follows that 


For example, 


det (£.4) =£”det(-d) 


kan 

ka 2 i 

ka 3 i 


kct\2 &213 
kct22 kctji 
ka^2 kct-ft 


= k : 


a n 

a 21 

<*31 


a l2 

a 13 

a22 

<*23 

« 32 

<* 33 


( 1 ) 


Unfortunately, no simple relationship exists among det(j4), det(5). and det(-d ■+ B). In particular, we emphasize 
that det(j4 -(- B) will usually not be equal to det(zl) + det(5). The following example illustrates this fact. 

EXAMPLE 1 det {A + B) * det(A) + det(B) 


Consider 




1 

3 


A + B = 


4 

3 


3 

8 


We have det(^4) = 1, det (B) = 8, and det(v4 + B) = 23; thus 

det(j4 4- B) * det(j4) + det (5) 


In spite of the previous example, there is a useful relationship concerning sums of determinants that is applicable 
when the matrices involved are the same except for one row (column). For example, consider the following two 
matrices that differ only in the second row: 


<*11 <*12 
<*21 <*22 


and 


•an <*12 
*21 *22 


Calculating the determinants of A and B we obtain 
















det(^4) + det(5) = 011^22 “<*12<*2l) + (<*11*22 ~ <*12*21) 
= <*11 (<*22 + *22) “<*12(<*21 + *2l) 

-dt" aU ai2 

^21 ^21 <* 22+*22 


Thus 


det 


<*11 

<*21 


<*12 

<*22 


4- det 


<*11 <*12 
*21 *22 


= det 


<*11 

<*21 +*21 


<*12 

<*22 +*22 


This is a special case of the following general result. 


THEOREM 2.3.1 

Let A, B , and C be ^ x n matrices that differ only in a single row, say the rth, and assume that the rth row 
of C can be obtained by adding corresponding entries in the rth rows of A and B. Then 

det(C) = det(A) + det(5) 

The same result holds for columns. 


EXAMPLE 2 Sums of Determinants 

We leave it to you to confirm the following equality by evaluating the determinants. 


17 5 


'1 7 5' 


'1 7 5' 

2 0 3 

= det 

2 0 3 

+ det 

2 0 3 

1 - 

+ 

O 

4^ 

+ 

<1 

+ 

i 

V_ f 

1_ 


1 4 7 


0 1 -1 


Determinant of a Matrix Product 

Considering the complexity of the formulas for determinants and matrix multiplication, it would seem unlikely 
that a simple relationship should exist between them. This is what makes the simplicity of our next result so 
surprising. We will show that if A and B are square matrices of the same size, then 

det(A£?) = det (^4) det (5) (2) 

The proof of this theorem is fairly intricate, so we will have to develop some preliminary results first. We begin 
with the special case of 2 in which A is an elementary matrix. Because this special case is only a prelude to 2, we 
call it a lemma. 
















LEMMA 2.3.2 


If B is an ^ x n matrix and E is an ^ x n elementary matrix, then 

det (EB) = det(£) det (5) 


We will consider three cases, each in accordance with the row operation that produces the matrix E. 

If E results from multiplying a row of l n by k , then by Theorem 1.5.1, EB results from B by multiplying 
the corresponding row by k\ so from Theorem 2.2.3(a) we have 

det (EB) = k det (B) 

But from Theorem 2.2.4(a) we have det(£) = k , so 

det (EB) = det (E) det (B) 

The proofs of the cases where E results from interchanging two rows of l n or from adding a 
multiple of one row to another follow the same pattern as Case 1 and are left as exercises. 

It follows by repeated applications of Lemma 2.3.2 that if B is an n x n matrix and E \, E 2 ,E r are 
n x n elementary matrices, then 


det(E\E 2 --J 2 r B) = det(2?i) det(S2)...det(£ r )det(5) 


(3) 


Determinant Test for Invertibility 

Our next theorem provides an important criterion for determining whether a matrix is invertible. It also takes us a 
step closer to establishing Formula 2. 


THEOREM 2.3.3 

A square matrix A is invertible if and only if det(-d) * 0. 


Let R be the reduced row echelon form of A. As a preliminary step, we will show that det(^4) and det(£) 
are both zero or both nonzero: Let E\, E 2 , E r be the elementary matrices that correspond to the elementary 
row operations that produce R from A. Thus 


and from 3, 


R = E r • • • E 2 E\A 


det (R) = det (E r ) • • • det(£2) det(i?i) det(^4) 


(4) 


We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant of an elementary matrix 
is nonzero. Thus, it follows from Formula 4 that det(^4) and det(R) are either both zero or both nonzero, which 
sets the stage for the main part of the proof. If we assume first that A is invertible, then it follows from Theorem 
1.6.4 that R = [ and hence that dzt(R) = 1 ( * 0). This, in turn, implies that det(y4) * 0, which is what we 
wanted to show. 


It follows from Theorems 2.3.3 and Theorem 
2.2.5 that a square matrix with two proportional 
rows or two proportional columns is not 
invertible. 


Conversely, assume that det (A) ± 0. It follows from this that det (R) * 0, which tells us that R cannot have a row 
of zeros. Thus, it follows from Theorem 1.4.3 that R = / and hence that A is invertible by Theorem 1.6.4. 

EXAMPLE 3 Determinant Test for Invertibility 


Since the first and third rows of 


A = 


1 2 
1 0 
2 4 


3 

1 

6 


are proportional, det(A) = 0. Thus A is not invertible. 


We are now ready for the main result concerning products of matrices. 


THEOREM 2.3.4 

If A and B are square matrices of the same size, then 

det (AS) = det (A) det (B) 


We divide the proof into two cases that depend on whether or not A is invertible. If the matrix A is not 
invertible, then by Theorem 1.6.5 neither is the product AB. Thus, from Theorem Theorem 2.3.3, we have 
det(AS) = 0 and det (21) = 0, so it follows that det (AS) = det (A) det (5). 





Augustin Louis Cauchy (1789-1857) 

In 1815 the great French mathematician Augustin Cauchy published a landmark paper 
in which he gave the first systematic and modem treatment of determinants. It was in that paper that 
Theorem 2.3.4 was stated and proved in full generality for the first time. Special cases of the theorem had 
been stated and proved earlier, but it was Cauchy who made the final jump. 
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Now assume that A is invertible. By Theorem 1.6.4, the matrix A is expressible as a product of elementary 
matrices, say 


A = E\B2 ■ ■ ■ E r 

so 

AB = E\E2 m • ■ E r B 

Applying 3 to this equation yields 

det(AS) = det(£i)det(i? 2 ) * ' “ det(SV)det(S) 

and applying 3 again yields 

det(AS) = det(i?ii ?2 " ■ ■ SV)det(S) 
which, from 5, can be written as det (AS) = det(A)det(S). 

EXAMPLE 4 Verifying That det (AB) = det(A), det(B) 

Consider the matrices 

A = 

We leave it for you to verify that 

det (A) = 1, det (5) = - 23, and det (AS) = - 23 
Thus det (AS) = det (A) det (S), as guaranteed by Theorem 2.3.4. 


B = 


-1 3 
5 8 


AB = 


2 17 

3 14 


(5) 


The following theorem gives a useful relationship between the determinant of an invertible matrix and the 
determinant of its inverse. 









THEOREM 2.3.5 


If A is invertible, then 


det(y4 -1 ) = 


1 

det(^) 


Since A = L it follows that det(j4 ^ A) = det(7). Therefore, we must have det(A 1 ) det(.d) = 1. 
Since det(-4) * 0, the proof can be completed by dividing through by det(-d). 


Adjoint of a Matrix 

In a cofactor expansion we compute det(^4) by multiplying the entries in a row or column by their cofactors and 
adding the resulting products. It turns out that if one multiplies the entries in any row by the corresponding 
cofactors from a different row, the sum of these products is always zero. (This result also holds for columns.) 
Although we omit the general proof, the next example illustrates the idea of the proof in a special case. 


It follows from Theorems 2.3.5 and 2.1.2 that 


det(^ _1 ) = 


1 1 


1 


«11 «22 ««« 
Moreover, by using the adjoint formula it is 
possible to show that 


_L J_ _L_ 

* 11 ' a 22 ’ ’ “yin 

are actually the successive diagonal entries of 
A (compare A and A i n Example 3 of 


Section 1.7). 


EXAMPLE 5 Entries and Cofactors from Different Rows 


Let 


a n 

<*12 

a 13 

«21 

<^22 

<*23 

a 3 \ 

«32 

«33 


Consider the quantity 


^ 11^*31 +£ 12^32 +< 213^33 


that is formed by multiplying the entries in the first row by the cofactors of the corresponding entries 
in the third row and adding the resulting products. We can show that this quantity is equal to zero by 
the following trick: Construct a new matrix A f by replacing the third row of A with another copy of the 
first row. That is, 










A' = 


an a 12 <*13 
<*2\ &22 «23 
a n a n ai2 


Let C5 i.C32.Cb be the cofactors of the entries in the third row of A r ■ Since the first two rows of A 
and A r are the same, and since the computations of C 31 , C 32 , C 33 , ^31^32 , and C 33 involve only 
entries from the first two rows of A and A*, it follows that 


C 3 1 = C^, C 32 = C' 2 , C33 = C33 

Since A' has two identical rows, it follows from 3 that 


detail') = 0 


On the other hand, evaluating det^’) by cofactor expansion along the third row gives 


det(j4') =ai\C' 2 i +^ 12^32 + a 13^33 = a 11 C 31 +«12^32 + a 13^33 


( 6 ) 


(7) 


From 6 and 7 we obtain 

^ll^l + a 12^32 + a 13^33 = 0 


DEFINITION 1 

If A is any nxn matrix and Cjj is the cofactor of a ij. then the matrix 

~c n c n ... c in 
c 2 i c 22 ... c 2 „ 

Cn 1 C n 2 ... C nn 

is called the matrix of cofactors from A. The transpose of this matrix is called the adjoint of A and is 
denoted by adj(A). 


EXAMPLE 6 Adjoint of a 3 x 3 Matrix 


Let 


The cofactors of A are 


A = 


3 

1 

2 


2 

6 

-4 


-1 

3 

0 


Cu = 12 Ci2 = 6 Ci3= -16 

C 2 i =4 C 22 = 2 C22 = 16 

C31 = 12 C 32 =-10 C33 = 16 


so the matrix of cofactors is 








and the adjoint of A is 


12 

6 

-16 


4 

2 

16 


12 

-10 

16 



12 

4 

12 

i = 

6 

2 ■ 

-10 


-16 

16 

16 




I \ 


Leonard Eugene Dickson (1874-1954) 

The use of the term adjoint for the transpose of the matrix of cofactors appears to have 
been introduced by the American mathematician L. E. Dickson in a research paper that he published in 
1902. 
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In Theorem 1.4.5 we gave a formula for the inverse of a 2 x 2 invertible matrix. Our next theorem extends that 
result to n x n invertible matrices. 


Inverse of a Matrix Using Its Adjoint 


If A is an invertible matrix, then 


A - 1 


_ 1 

det(^) 


adj(^) 


( 8 ) 


We show first that 


A adj (A) = det(^4)/ 








Consider the product 


A adj ( A) = 


011 

012 • 

•• 0\ n 

021 

022 • 

• • “In 

0/1 

0/2 • 

• • Oin 

Jh 1 

0*2 • 



C„ C21 ... 

Cp C22 


Cj 1 


Cj2 


c,„ ... c 




.. C„,' 

■ • C„2 

.. c„„ 


The entry in the z'th row and /th column of the product A adj(zl) is 

a i 1 Cj 1 + + - - - + GjnCjn 

(see the shaded lines above). 


(9) 


If i = j, then 9 is the cofactor expansion of det(A) along the z'th row of A (Theorem 2.1.1), and if i * j, then the 
a's and the cofactors come from different rows of.l. so the value of 9 is zero. Therefore, 


A adj (A) = 


0 0 ... det(A) 

Since A is invertible, det(A) * 0. Therefore, Equation 10 can be rewritten as 


det(A) 0 ... 0 

0 det(A) ... 0 


= det(A)/ 


( 10 ) 


1 


det(A) 

Multiplying both sides on the left by A -1 yields 


[Aadj(A)]=/ or A 


1 


det(A) 


■adj 04) 


= 1 


A~ l = 


det(A) 


-adj(A) 


EXAMPLE 7 Using the Adjoint to Find an Inverse Matrix 


Use 8 to find the inverse of the matrix A in Example 6. 

We leave it for you to check that det(A) = 64. Thus 


A -1 


1 

det(A) 


adj(A) 


12 

4 

12 

6 

2 

-10 

-16 

16 

16 


12 

4 

12 

64 

64 

64 

6 

2 

10 

64 

64 

64 

16 

16 

16 

64 

64 

64 


Cramer's Rule 


Our next theorem uses the formula for the inverse of an invertible matrix to produce a formula, called Cramer's 


















rule, for the solution of a linear system /ix = b °f n equations in n unknowns in the case where the coefficient 
matrix A is invertible (or, equivalently, when det(j4) * 0). 


OREM 2.3.7 Cramer's Rule 


If;4x = b is a system of n linear equations in n unknowns such that det(^) * 0, then the system has a 
unique solution. This solution is 

det(^i) det(A 2 ) _ det(A„) 

1 det(^) ’ 2 det (A) ’ ’ ” det(A) 

where A, is the matrix obtained by replacing the entries in the yth column of A by the entries in the matrix 



If det(^4) 0, then A is invertible, and by Theorem 1.6.2, x = A *b is the unique solution of Ax = b- 

Therefore, by Theorem 2.3.6 we have 


x = ^ _1 b = , .1 „„ adj(^)b = 1 


det (A) 


'C n 

C 2 l - 

C n \ 

V 

Cn 

^22 — 

Cn 2 

b2 

i- 

Q .. 

3 

C2n — 

Cyin 

bn 


Multiplying the matrices out gives 


x = 


1 


det(il) 

The entry in the yth row of x is therefore 


det(A) 


^l^Tl +^2^21 +---+ 

b\C\2 + b2C22 + ... + b„C ri 2 


b 1 C\ n + + - - - d ~ b n C : 


ym 


b i C\$ + b^C^ 4-... + b n C n j 

det(j4) 


*> = 


( 11 ) 


Now let 



a\\ 

an — 

a lj-l 

b 1 

fl lj+l - 

.. a\ n 

A J = 

<221 

<*22 — 

a 2j-l 

b 2 

a 2j+l - 

-• a 2n 


a n\ 

a n2 -.. 

a nj—1 

bn 

a nj+l ■ 

-• a nn 


Since Aj differs from .1 only in the /ill column, it follows that the cofactors of entries b\, bj,b n in Aj are the 
same as the cofactors of the corresponding entries in the /ill column of A. The cofactor expansion of det (A,) 
along the /ill column is therefore 



















det(j4y) — b\C\j + ^2^2 j + ••• + bn^n] 

Substituting this result in 11 gives 

det(^4 ? ) 

Xj= det (A) 


EXAMPLE 8 Using Cramer's Rule to Solve a Linear System 

Use Cramer's rule to solve 


x\ 

+ 


+ 

2x3 = 

6 

-3xi 

+ 

4^2 

+ 

6x3 = 

30 

-xi 

— 

2x2 

+ 

3x 3 = 

8 



Gabriel Cramer (1704-1752) 


Variations of Cramer's rule were fairly well known before the Swiss 
mathematician discussed it in work he published in 1750. It was Cramer's superior notation 
that popularized the method and led mathematicians to attach his name to it. 

[Image: Granger Collection ] 



1 0 2' 


'6 0 2' 

A = 

-3 4 6 

. A\ = 

30 4 6. 



-1 -2 3_ 


8 -2 3_ 


1 6 2' 


1 0 6 

a 2 = 

-3 30 6 

. A3 = 

-3 4 30 


-18 3 


-1 -2 8 


Solution 












For n > 3, it is usually more efficient to 
solve a linear system with n equations in n 
unknowns by Gauss-Jordan elimination 
than by Cramer's rule. Its main use is for 
obtaining properties of solutions of a 
linear system without actually solving the 
system. 


Therefore, 

_ det(i4i) _ -40 _ -10 _ det(^ 2 ) _ 72 _ 18 

1 det(^) 44 11 ’ 2 det(i4) 44 11’ 

det(^ 3 ) _ 152 _ 38 
3 det(^) 44 11 


Equivalence Theorem 

In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix A. We conclude this 
section by merging Theorem 2.3.3 with that list to produce the following theorem that relates all of the major 
topics we have studied thus far. 


Equivalent Statements 

If A is an ^ x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is l n . 

(d) A can be expressed as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b. 

(f) Ax = b has exactly one solution for every ^ x 1 matrix b. 

(g) det(j4) * 0 . 


OPTIONAL 

We now have all of the machinery necessary to prove the following two results, which we stated without proof 
Theorem 1.7.1: 

Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 








Theorem 1.7.1 (d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an 
invertible upper triangular matrix is upper triangular. 

Let A = [a j ? ] be a triangular matrix, so that its diagonal entries are 


a \\> a 22> —» a nn 

From Theorem 2.1.2, the matrix ,4 is invertible if and only if 

det(A) = <^11^22 ' * &ym 

is nonzero, which is true if and only if the diagonal entries are all nonzero. 

We will prove the result for upper triangular matrices and leave the lower 
triangular case for you. Assume that A is upper triangular and invertible. Since 


A~ l 


1 

det (A) 


adj (A) 


we can prove that A ~* is upper triangular by showing that adj(^4) is upper triangular or, equivalently, that the 
matrix of cofactors is lower triangular. We can do this by showing that every cofactor Cy with i < j (i.e., above 
the main diagonal) is zero. Since 




it suffices to show that each minor My with i < j is zero. For this purpose, let 5y be the matrix that results when 
the /th row and yth column of A are deleted, so 


M y — det(5y) 


( 12 ) 


From the assumption that j < j, it follows that 5y is upper triangular (see Figure Figure 1.7.1). Since A is upper 
triangular, its (i + 1) -st row begins with at least / zeros. But the /th row of £y is the (i + 1)-st row of A with the 
entry in they'th column removed. Since i < j, none of the first / zeros is removed by deleting theyth column; thus 
the /th row of starts with at least / zeros, which implies that this row has a zero on the main diagonal. It now 
follows from Theorem 2.1.2 that det(5y) = 0 and from 12 that My = 0. 


Concept Review 

Determinant test for invertibility 
Matrix of cofactors 
Adjoint of a matrix 
Cramer's rule 

Equivalent statements about an invertible matrix 

Skills 

Know how determinants behave with respect to basic arithmetic operations, as given in Equation 1, 
Theorem 2.3.1, Lemma 2.3.2, and Theorem 2.3.4. 

Use the determinant to test a matrix for invertibility. 



Know how det (.4) and det(^4 *) are related. 

Compute the matrix of cofactors for a square matrix A. 

Compute adj(^4) for a square matrix A. 

Use the adjoint of an invertible matrix to find its inverse. 

Use Cramer's rule to solve linear systems of equations. 

Know the equivalent characterizations of an invertible matrix given in Theorem 2.3.8. 


Exercise Set 2.3 

In Exercises 1-4, verify that det(/Li4) = & J1 det(.d). 

1 . 


' A = 


2 . 


A = 


3. 


A = 


4. 


A = 


-1 2 
3 4 

2 2 
5 -2 

2 -1 3 

3 2 1 

1 4 5 

1 1 1 
0 2 3 

0 1 -2 


; k = 2 
■ k= -4 

k= -2 


k = 3 


In Exercises 5-6, verify that det (AB) = det (BA) and determine whether the equality 
det(^4 + B) = det(j4) 4- det(5) holds. 


5. 

~2 1 0" 



"1 

-1 3" 


A = 

3 4 0 

II 

% 

7 

1 2 



0 0 2 



5 

0 1 


6 . 

'-1 8 

2 ' 



'2 -1 

-4' 

A = 

1 0 

-1 

and B = 

1 1 

3 


-2 2 

2 



0 3 

-1 


In Exercises 7-14, use determinants to decide whether the given matrix is invertible. 

7 . 


A = 


2 5 5 

-1 -1 0 

2 4 3 


Answer: 

Invertible 




















8. [ 2 0 3' 

A= 0 3 2 

-2 0 -4 

9 . [2-3 5' 

A= 0 1-3 

0 0 2 _ 

Answer: 

Invertible 

10 . [-3 0 1' 

A= 5 0 6 

8 0 3 

11. [ 4 2 8' 

A= -2 1 -4 

3 1 6 

Answer: 

Not invertible 

12. [1 0 -l' 

A= 9-1 4 

8 9-1 

13. [20 O' 

A= 8 10 

-5 3 6 

Answer: 

Invertible 

{2 0 

3/2 - 3/7 0 

5 -9 0 

In Exercises 15-18, find the values of k for which A is invertible. 



Answer: 




17. [12 4' 

A= 3 16 

* 3 2_ 

Answer: 

fc * -1 

18. [12 0" 

.4 = k 1 k 

0 2 1 

In Exercises 19-23, decide whether the given matrix is invertible, and if so, use the adjoint method to find its 
inverse. 

19. [25 5" 

A= -1 -1 0 

2 4 3_ 

Answer: 

3 -5 —5 

A~ x = -3 4 5 

2 -2 -3 

20. [ 2 0 3" 

A= 0 3 2 

-2 0 -4 

21. [2-3 5' 

A= 0 1-3 

0 0 2 

Answer: 




22 . [ 2 0 0 " 

A= 8 10 
-5 3 6 

23. [13 11" 

2 5 2 2 
13 8 9 
13 2 2 



Answer: 



A - 1 


-A 

2 

-7 

6 


3 

-1 

0 

0 


0 

0 

-1 

1 


-1 

0 

8 

-7 


In Exercises 24-29, solve by Cramer's rule, where it applies. 


24. 7xi 

- 2x2 = 

3 



3xi 

+ X2 = 

5 



25. 4x 

+ 5y 

= 

= 2 


1 lx 

+ y + 

2 z = 

= 3 


X 

+ 5y + 

2 z - 

= 1 


Answer: 





3 2 


1 


x = - 

TT’ y = JT’ 

' 

11 


26. x 

- Ay + 

z = 

6 


Ax 

— y + 

2 z = 

-1 


2 x 

+ 2 y - 

3z = 

-20 


27. xi 

— 3x2 + 

*3 

= 4 


2 x\ 

“ *2 


= -2 


4xi 

— 

3x 3 

= 0 


Answer: 





30 

38 

40 


XI = 

"TT * 2= - 

ir 

3 = “TT 


28. -xi 

— 4x2 4* 

2 x 3 

+ x 4 = 

-32 

2 xi 

- X2 + 

7x 3 

+ 9x4 = 

14 

-*1 

+ X2 + 

3x 3 

+ x 4 = 

11 

*1 

- 2x2 + 

*3 

— 4x 4 = 

-4 

29. 3xi 

x 2 + 

*3 

= 4 


-*1 

+ 7x2 - 

2 x 3 

= 1 


2 xi 

+ 6 x 2 - 

x 3 

= 5 



Answer: 


Cramer's rule does not apply. 
30. Show that the matrix 


A = 


cos 0 sin 0 0 
—sin 0 cos 0 0 
0 0 1 


is invertible for all values of 0; then find A * using Theorem 2.3.6. 


31. Use Cramer's rule to solve for y without solving for the unknowns x, z, and w. 



Answer: 


4x 

+ 

y 

+ 

Z 

+ 

w = 

6 

3x 

+ 

ly 

— 

Z 

+ 

w = 

1 

lx 

+ 

3 y 

— 

5z 

+ 

8w = 

-3 

X 

4= 

y 

+ 

z 

+ 

2w = 

3 


y = o 

32. Let Jix = b be the system in Exercise 31. 

(a) Solve by Cramer's rule. 

(b) Solve by Gauss-Jordan elimination. 

(c) Which method involves fewer computations? 

33. Prove that if det(^4) = 1 and all the entries in A are integers, then all the entries in are integers. 


34. Let Jix = b be a system of n linear equations in n unknowns with integer coefficients and integer constants. 
Prove that if det(^4) = 1, the solution x has integer entries. 

35. Let 


a b c 


A = 


d 

g 


e / 
h i 


Assuming that det(^4) = — 7, find 

(a) det(3^) 

(b) det(^ _1 ) 

(c) det(2A -1 ) 

(d) det((2^) _1 ) 

(e) a g d 
det b h e 

c i f 


Answer: 

(a) -189 

(b) _i 

(c) 

(d) _J_ 

56 

(e) 7 

36. In each part, find the determinant given that A is a 4 x 4 matrix for which det(-d) = — 2 . 
(a) det( — A) 



(b) det(j4 -1 ) 

(c) det(2^ r ) 

(d) det(J 3 ) 

37. In each part, find the determinant given that A is a 3 x 3 matrix for which det(^4) = 7 . 

(a) det(3^4) 

(b) det(^ _1 ) 

(c) det(2^ _1 ) 

(d) det((2^) _1 ) 

Answer: 

(a) 189 

<b> i 

(c) I 

(d) J_ 

56 

38. Prove that a square matrix A is invertible if and only if A ^A is invertible. 

39. Show that if A is a square matrix, then det(^4 J A) = det( AA T ). 

True-False Exercises 

In parts (a)-(l) determine whether the statement is true or false, and justify your answer. 

(a) If A is a 3 x 3 matrix, then det(2-d) = 2 det(^4). 

Answer: 

False 

(b) If A and B are square matrices of the same size such that det(^4) = det(5), then det(^4 + B) = 2 det(^4). 
Answer: 

False 

(c) If A and B are square matrices of the same size and A is invertible, then 

detG4 _1 &4) = det(5) 


Answer: 


True 


(d) A square matrix A is invertible if and only if det(^4) = 0. 

Answer: 

False 

(e) The matrix of cofactors of A is precisely [ adj(^4) ] . 

Answer: 

True 

(f) For every ^ x n matrix A, we have 

A • adj(-d) = (det(-d))/„ 


Answer: 

True 

(g) If A is a square matrix and the linear system Ax = 0 has multiple solutions for x, then det(^4) = 0. 

Answer: 

True 

(h) If A is an ^ x n matrix and there exists an n x 1 matrix b such that the linear system Ax = b has no solutions, 
then the reduced row echelon form of A cannot be I n . 

Answer: 

True 

(i) If E is an elementary matrix, then Ex = 0 has only the trivial solution. 

Answer: 

True 

(j) If A is an invertible matrix, then the linear system Ax = 0 has only the trivial solution if and only if the linear 
system A~^x = 0 has only the trivial solution. 

Answer: 

True 

(k) If A is invertible, then adj(^4) must also be invertible. 

Answer: 

True 

(l) If A has a row of zeros, then so does adj(-d). 

Answer: 


False 
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Supplementary Exercises 


In Exercises 1-8, evaluate the determinant of the given matrix by (a) cofactor expansion and (b) using 
elementary row operations to introduce zeros into the matrix. 


1 . 


-4 2 
3 3 


Answer: 


-18 


2 . 



0 2-1 

-3 1 1 


Answer: 


24 


4. 



1 1 1 

0 4 2 


Answer: 


-10 


-5 

1 

4 


3 

0 

2 


1 

-2 

2 


3 

6 

0 

1 

-2 

3 

1 

4 

1 

0 ■ 

-1 

1 

-9 

2 ■ 

-2 

2 


Answer: 

329 

"—1 -2 -3 -4 
4 3 2 1 

12 3 4 

—4 -3 -2 -1 


8 . 


















9. Evaluate the determinants in Exercises 3-6 by using the arrow technique (see Example 7 in Section 2.1). 
Answer: 

Exercise 3: 24; Exercise 4: 0; Exercise 5: -10; Exercise 6: -48 

(a) Construct a 4 x 4 matrix whose determinant is easy to compute using cofactor expansion but hard to 
evaluate using elementary row operations. 

(b) Construct a 4 x 4 matrix whose determinant is easy to compute using elementary row operations but 
hard to evaluate using cofactor expansion. 

11. Use the determinant to decide whether the matrices in Exercises 1-4 are invertible. 

Answer: 


The matrices in Exercises 1-3 are invertible, the matrix in Exercise 4 is not. 

12. Use the determinant to decide whether the matrices in Exercises 5-8 are invertible. 

In Exercises 13-15, find the determinant of the given matrix by any method. 


13. 


5 

6-2 


b -3 
-3 


Answer: 


14. 


15. 


_* 2 + 56-21 
3-4 a 

a 1 1 2 

2 a-1 4 

0 0 0 0 

0 0 0 -4 

0 0-1 0 

0 2 0 0 

5 0 0 0 


-3 

0 

0 

0 

0 


Answer: 

-120 

16. Solve forx. 


x -1 
3 1-x 


1 0 -3 

2 x —6 

1 3 x-5 


In Exercises 17-24, use the adjoint method (Theorem 2.3.6) to find the inverse of the given matrix, if it 
exists. 


17. The matrix in Exercise 1. 



Answer: 


_I 1 

6 9 

1 2 

9 

18. The matrix in Exercise 2. 

19. The matrix in Exercise 3. 


Answer: 


1 

1 

3 

8 

8 

8 

1 

5 

1 

8 

24 

24 

1 

7 

1 

4 

12 

12 


20. The matrix in Exercise 4. 

21. The matrix in Exercise 5. 


Answer: 


1 2 _J_ 

5 5 10 

1 _3 2 

5 5 5 

_2 6 _ 3 _ 

5 5 10 

22. The matrix in Exercise 6. 

23. The matrix in Exercise 7. 


Answer: 


10 

2 

52 

27 

329 

329 

329 

329 

55 

11 

43 

16 

329 

329 

329 

329 

3 

10 

25 

6 

47 

47 

47 

47 

31 

72 

102 

15 

"329 

329 

329 

329 


24. The matrix in Exercise 8. 

25. Use Cramer's rule to solve for x' and y' in terms ofx and v. 



Answer: 


x 

y 




r 


/ 


*' = |* + ^y, y' = - JX + 


26. Use Cramer's rule to solve for x' and y' in terms ofx and v. 


x =x r cos 6— y l sin# 
y =x f smd+y' cos 6 


27. By examining the determinant of the coefficient matrix, show that the following system has a nontrivial 
solution if and only if a = 3. 

x + y + az = 0 

x + y + ,3z = 0 

ax + fty + z = 0 


28. Let A be a 3 x 3 matrix, each of whose entries is 1 or 0. What is the largest possible value for det(j4)‘? 

29’ (a) For the triangle in the accompanying figure, use trigonometry to show that 

b cos 7 + c cos ,3 — a 

c cos a + a cos j = b 

a cos + b cos a = c 

and then apply Cramer's rule to show that 


cos ft = 


>2 , 2 2 
b +c — a 

2 be 


(b) Use Cramer's rule to obtain similar formulas for cos.)' and COS7. 



Figure Ex-29 


Answer: 


(b) a c 2 I a 1 — b 2 
v 7 cos £j = —— L - tl — 


, cos 7 = 


2 , .2 2 

a ±b — c 


lac ’ ' lab 

30. Use determinants to show that for all real values of X, the only solution of 

x — 2y — Xx 
x - y = A y 

is x = 0, y = 0- 






31. Prove: If A is invertible, then adj(^4) is invertible and 


[adjU)]-‘ = 


1 


det(j4) 


-A = ad)(A~ l ) 


32. Prove: If A is an n x n matrix, then 


det[adj(j4)] = [det(j4)] 


n -1 


33. Prove: If the entries in each row of an « x n matrix A add up to zero, then the determinant of A is zero. 
[Hint: Consider the product AX> where X is the « x 1 matrix, each of whose entries is one. 

(a) In the accompanying figure, the area of the triangle ABC can be expressed as 

area ABC = area ADEC A- area CEFB — dxzaADFB 
Use this and the fact that the area of a trapezoid equals the altitude times the sum of the parallel 
sides to show that 


area ABC = — 


[Note: In the derivation of this formula, the vertices are labeled such that the triangle is traced 
counterclockwise proceeding from (x j, y [) to (x- f , y -■) to (^ 3 , v-.;i • For a clockwise orientation, the 
determinant above yields the negative of the area.] 

(b) Use the result in (a) to find the area of the triangle with vertices (3, 3), (4, 0), (-2, -1). 


*1 

y 1 

1 

*3 

72 

1 

*3 

73 

1 



35. Use the fact that 21,375, 38,798, 34,162, 40,223, and 79,154 are all divisible by 19 to show that 

2 13 7 5 


7 

1 

2 

1 


is divisible by 19 without directly evaluating the determinant. 
36. Without directly evaluating the determinant, show that 

sin ct cos a sin(o: + 5) 
sin @ cos @ sin(,$ + S ) 
sin 7 cos 7 sin (7 + <5) 


= 0 
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CHAPTER 



Euclidean Vector Spaces 



CHAPTER CONTENTS 

Vectors in 2-Space, 3-Space, and n-Space 

Norm, Dot Product, and Distance in R" 
Orthogonality 

The Geometry of Linear Systems 
Cross Product 


INTRODUCTION 

Engineers and physicists distinguish between two types of physical quantities— scalars, 
which are quantities that can be described by a numerical value alone, and vectors, which 
are quantities that require both a number and a direction for their complete physical 
description. For example, temperature, length, and speed are scalars because they can be 
fully described by a number that tells “how much”—a temperature of 20°C, a length of 5 
cm, or a speed of 75 km/h. In contrast, velocity and force are vectors because they require 
a number that tells “how much” and a direction that tells “which way”—say, a boat 
moving at 10 knots in a direction 45° northeast, or a force of 100 lb acting vertically. 
Although the notions of vectors and scalars that we will study in this text have their 
origins in physics and engineering, we will be more concerned with using them to build 
mathematical structures and then applying those structures to such diverse fields as 
genetics, computer science, economics, telecommunications, and environmental science. 
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3.1 Vectors in 2-Space, 3-Space, and n-Space 

Linear algebra is concerned with two kinds of mathematical objects, “matrices” and “vectors.” We are already 
familiar with the basic ideas about matrices, so in this section we will introduce some of the basic ideas about 
vectors. As we progress through this text we will see that vectors and matrices are closely related and that 
much of linear algebra is concerned with that relationship. 


Geometric Vectors 

Engineers and physicists represent vectors in two dimensions (also called 2-space) or in three dimensions 
(also called 3-space) by arrows. The direction of the arrowhead specifies the direction of the vector and the 
length of the arrow specifies the magnitude. Mathematicians call these geometric vectors. The tail of the 
arrow is called the initial point of the vector and the tip the terminal point (Figure 3.1.1). 

Terminal point 


Initial point 

Figure 3.1.1 

In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we will denote scalars in 
lowercase italic type such as a, k, v, w, and x. When we want to indicate that a vector v has initial point A and 
terminal point B, then, as shown in Figure 3.1.2, we will write 

v= AB 


B 


v 


A 



Figure 3.1.2 

Vectors with the same length and direction, such as those in Figure 3.1.3, are said to be equivalent. Since we 
want a vector to be determined solely by its length and direction, equivalent vectors are regarded to be the 
same vector even though they may be in different positions. Equivalent vectors are also said to be equal, 
which we indicate by writing 


v = w 



Equivalent vectors 


Figure 3.1.3 

The vector whose initial and terminal points coincide has length zero, so we call this the zero vector and 
denote it by 0. The zero vector has no natural direction, so we will agree that it can be assigned any direction 
that is convenient for the problem at hand. 


Vector Addition 

There are a number of important algebraic operations on vectors, all of which have their origin in laws of 
physics. 


Parallelogram Rule for Vector Addition 


If v and w are vectors in 2-space or 3 -space that are positioned so their initial points coincide, then the 
two vectors form adjacent sides of a parallelogram, and the sum v | w is the vector represented by 
the arrow from the common initial point of y and w to the opposite vertex of the parallelogram 
(Figure 3.1.4a). 


V 



<«) 


w 


V + w 


0 ) 

Figure 3.1.4 


W 

V + w 

W + V 


V 


w 


(r) 


Here is another way to form the sum of two vectors. 


Triangle Rule for Vector Addition 

If y and w are vectors in 2-space or 3-space that are positioned so the initial point of w is at the 
terminal point of y, then the sum v | w is represented by the arrow from the initial point of y to the 
terminal point of w (Figure 3.1.46). 








In Figure 3.1.4c we have constructed the sums v | w and w | v by the triangle rule. This construction makes 
it evident that 


v+w = w + v (1) 

and that the sum obtained by the triangle rule is the same as the sum obtained by the parallelogram rule. 

Vector addition can also be viewed as a process of translating points. 


Vector Addition Viewed as Translation 

If v, w, and v | w are positioned so their initial points coincide, then the terminal point of v | w can 
be viewed in two ways: 

1. The terminal point of v | w is the point that results when the terminal point of y is translated in 
the direction of w by a distance equal to the length of w (Figure 3.1.5a). 

2. The terminal point of v | w is the point that results when the terminal point of w is translated in 
the direction of y by a distance equal to the length of y (Figure 3.1. 5b). 

Accordingly, we say that y ) w is the translation of y by w or, alternatively, the translation of w by y. 


V 


V + w 


V + H 


/ 


» 

(«) 


( b ) 


Figure 3.1.5 


Vector Subtraction 

In ordinary arithmetic we can write a— b = a + ( — b), which expresses subtraction in terms of addition. 
There is an analogous idea in vector arithmetic. 

r ~i 


Vector Subtraction 

The negative of a vector v , denoted by _v, is the vector that has the same length as v but is 
oppositely directed (Figure 3.1 .6a), and the difference of v from w- denoted by w _ v . is taken to be 


the sum 


w — v = w+ (—v) 


( 2 ) 


J 


V 


-V 

(a) 


/“ 

/ 

/ 

/ 

/ w-v 


/ 


(b) 

Figure 3.1.6 




(c) 


The difference of v from w can be obtained geometrically by the parallelogram method shown in Figure 

3.1. 6b , or more directly by positioning w and y so their initial points coincide and drawing the vector from the 

terminal point of y to the terminal point of w (Figure 3.1.6c). 


Scalar Multiplication 

Sometimes there is a need to change the length of a vector or change its length and reverse its direction. This 
is accomplished by a type of multiplication in which vectors are multiplied by scalars. As an example, the 
product 2v denotes the vector that has the same direction as y but twice the length, and the product _2v 
denotes the vector that is oppositely directed to y and has twice the length. Here is the general result. 


Scalar Multiplication 

If y is a nonzero vector in 2-space or 3-space, and if k is a nonzero scalar, then we define the scalar 
product of y by k to be the vector whose length is |£| times the length of y and whose direction is the 
same as that of y if k is positive and opposite to that of y if k is negative. If k = 0 or v = 0? then we 
define £v to be 0. 


Figure 3.1.7 shows the geometric relationship between a vector y and some of its scalar multiples. In 
particular, observe that ( — l)v has the same length as y but is oppositely directed; therefore, 


(_l)v= -V 


( 3 ) 



Figure 3.1.7 


Parallel and Collinear Vectors 

Suppose that y and w are vectors in 2-space or 3-space with a co m mon initial point. If one of the vectors is a 
scalar multiple of the other, then the vectors lie on a common line, so it is reasonable to say that they are 
collinear (Figure 3.1.8a). However, if we translate one of the vectors, as indicated in Figure 3.1.8&, then the 
vectors are parallel but no longer collinear. This creates a linguistic problem because translating a vector does 
not change it. The only way to resolve this problem is to agree that the terms parallel and collinear mean the 
same thing when applied to vectors. Although the vector 0 has no clearly defined direction, we will regard it 
to be parallel to all vectors when convenient. 


/ 

k\ ' 


V 


V 



k\ 


/ 


{a) 


/ 


(b) 


Figure 3.1.8 


Sums of Three or More Vectors 

Vector addition satisfies the associative law for addition, meaning that when we add three vectors, say u, y, 
and w, it does not matter which two we add first; that is, 

u+(v + w) = (u + v)+w 

It follows from this that there is no ambiguity in the expression u | v | w because the same result is obtained 
no matter how the vectors are grouped. 

A simple way to construct u | v | w is to place the vectors “tip to tail” in succession and then draw the 
vector from the initial point of u to the terminal point of w (Figure 3.1 .9a). The tip-to-tail method also works 
for four or more vectors (Figure 3.1 .9b). The tip-to-tail method also makes it evident that if u, y, and w are 
vectors in 3-space with a common initial point, then u | v | w is the diagonal of the parallelepiped that has 
the three vectors as adjacent sides (Figure 3.1.9c). 


V 


II 






II + (y + 
<U + V ) + 


») 

H 



W 


II 





X 


V 


w 




(b) 

Figure 3.1.9 


.*« / 


/r~ 

_o*i_ /j 

v w t /' 

1 / 




(C) 


Vectors in Coordinate Systems 

Up until now we have discussed vectors without reference to a coordinate system. However, as we will soon 
see, computations with vectors are much simpler to perform if a coordinate system is present to work with. 

The component forms of the zero vector are 
0 = ( 0 , 0 ) in 2 -space and 0 = ( 0 , 0 , 0 ) in 
3-space. 


If a vector y in 2-space or 3-space is positioned with its initial point at the origin of a rectangular coordinate 
system, then the vector is completely determined by the coordinates of its terminal point (Figure 3.1.10). We 
call these coordinates the components of y relative to the coordinate system. We will write v = (vi, V 2 ) to 
denote a vector y in 2-space with components (vj, V2) ? and v = (vi, V 2 , V 3 ) to denote a vector y in 3-space 
with components (vj, V 2 , V 3 ). 



a*- 



O' 1 , i>2' 


y 

>- 


Figure 3.1.10 

It should be evident geometrically that two vectors in 2-space or 3-space are equivalent if and only if they 
have the same terminal point when their initial points are at the origin. Algebraically, this means that two 
vectors are equivalent if and only if their corresponding components are equal. Thus, for example, the vectors 

▼ =(vi, V 2 , V 3 ) and w= Oi m> 2 , W 3 ) 
in 3-space are equivalent if and only if 

vi = M?i, V2 = W2, V 3 = W 3 

It may have occurred to you that an ordered pair (vj, V 2 ) can represent either a vector with 








components vi and V2 or a point with components vi and vj (and similarly for ordered triples). Both are valid 
geometric interpretations, so the appropriate choice will depend on the geometric viewpoint that we want to 
emphasize (Figure 3.1.11). 


1 ? 


(i'p <s) 


X 


The ordered pair (v i, V 2 ) can represent a point or a vector. 


Vectors Whose Initial Point Is Not at the Origin 

It is sometimes necessary to consider vectors whose initial points are not at the origin. If P { P 2 denotes the 
vector with initial point (x i, y \) and terminal point p 2 (x 2 , yi) > dien the components of this vector are 
given by the formula 

^2 = 0C2-*1.*2-*1) ( 4 ) 

That is, the components of P\P 2 are obtained by subtracting the coordinates of the initial point from the 
coordinates of the terminal point. For example, in Figure 3.1.12 the vector P\P 2 is the difference of vectors 
OP* 2 and OP 1, SO 

p[p 2 = op* 2 - m = ( x 2> yi) - Oi> yi ) = (. x 2~ x \> yi-y\) 

As you might expect, the components of a vector in 3-space that has initial point P\(x\,y\,z\) and terminal 
point P2(x2,y2,zi) are § iven b y 

P\P 2 = y 2 -y\, Z 2 -z\) ( 5 ) 



V = />,/>, =OP 2 - OP x 













Figure 3.1.12 


EXAMPLE 1 Finding the Components of a Vector 

The components of the vector v = p ^ p - : with initial point P\ (2, —1,4) and terminal point 
P 2 (l,5, -8) are 

v=(7-2, 5 — (—1), (—8) —4) = (5, 6, -12) 


n-Space 

The idea of using ordered pairs and triples of real numbers to represent points in two-dimensional space and 
three-dimensional space was well known in the eighteenth and nineteenth centuries. By the dawn of the 
twentieth century, mathematicians and physicists were exploring the use of “higher-dimensional” spaces in 
mathematics and physics. Today, even the layman is familiar with the notion of time as a fourth dimension, an 
idea used by Albert Einstein in developing the general theory of relativity. Today, physicists working in the 
field of “string theory” commonly use 11 -dimensional space in their quest for a unified theory that will 
explain how the fundamental forces of nature work. Much of the remaining work in this section is concerned 
with extending the notion of space to ^-dimensions. 

To explore these ideas further, we start with some terminology and notation. The set of all real numbers can 
be viewed geometrically as a line. It is called the real line and is denoted by /? or/? 1 . The superscript 
reinforces the intuitive idea that a line is one-dimensional. The set of all ordered pairs of real numbers (called 
2-tuples ) and the set of all ordered triples of real numbers (called 3-tuples) are denoted by p/ and p\ 
respectively. The superscript reinforces the idea that the ordered pairs correspond to points in the plane 
(two-dimensional) and ordered triples to points in space (three-dimensional). The following definition extends 
this idea. 


DEFINITION 1 

If n is a positive integer, then an orderedn-tuple is a sequence of n real numbers (vj, v 2 ,.... v^). 
The set of all ordered n-tuples is called n-space and is denoted by R n . 


You can think of the numbers in an n-tuple (vi, v 2 , v M ) as either the coordinates of a 
generalized point or the components of a generalized vector, depending on the geometric image you want to 
bring to mind—the choice makes no difference mathematically, since it is the algebraic properties of n-tuples 
that are of concern. 


Here are some typical applications that lead to //-tuples. 



Experimental Data A scientist performs an experiment and makes n numerical measurements each time 
the experiment is performed. The result of each experiment can be regarded as a vector 
y=(y\,y2,-,y n ) in R ” in which yuy2,->y» are the measured values. 

Storage and Warehousing A national trucking company has 15 depots for storing and servicing its trucks. 
At each point in time the distribution of trucks in the service depots can be described by a 15-tuple 
x= (*i, X2, .... * 15 ) in which * 1 is the number of trucks in the first depot, *2 is the number in the second 
depot, and so forth. 

Electrical Circuits A certain kind of processing chip is designed to receive four input voltages and 
produces three output voltages in response. The input voltages can be regarded as vectors in and the 
output voltages as vectors in ft-*. Thus, the chip can be viewed as a device that transforms an input vector 
v = (vi, V2, V3, V4) in i n t 0 an output vector w= (vi>i, W2, W3) in p?. 

Graphical Images One way in which color images are created on computer screens is by assigning each 
pixel (an addressable point on the screen) three numbers that describe the hue, saturation, and brightness 
of the pixel. Thus, a complete color image can be viewed as a set of 5-tuples of the form v = (x, y, h, s, b) 
in which x andy are the screen coordinates of a pixel and h, s, and b are its hue, saturation, and brightness. 

Economics One approach to economic analysis is to divide an economy into sectors (manufacturing, 
services, utilities, and so forth) and measure the output of each sector by a dollar value. Thus, in an 
economy with 10 sectors the economic output of the entire economy can be represented by a 10-tuple 
s = ($i, S2> -•-> s lo) m which the numbers sj, $ 2 , ..., $io are the outputs of the individual sectors. 

Mechanical Systems Suppose that six particles move along the same coordinate line so that at time t their 
coordinates are xj, * 2 ,.... *6 ar| d their velocities are vj, V 2 ,.... vg, respectively. This information can be 
represented by the vector 

v = (*i, *2, *3> * 4 > x 6 , vi, v 2 , V3, V4, V5, v 6 , t) 

in This vector is called the state of the particle system at time t. 



The German-bom physicist Albert Einstein immigrated to the United States in 
1935, where he settled at Princeton University. Einstein spent the last three decades of his life 
working unsuccessfully at producing a unified field theory that would establish an underlying link 
between the forces of gravity and electromagnetism. Recently, physicists have made progress on the 
problem using a framework known as string theory. In this theory the smallest, indivisible 
components of the Universe are not particles but loops that behave like vibrating strings. Whereas 



Einstein's space-time universe was four-dimensional, strings reside in an 11-dimensional world that is 
the focus of current research. 

[Image: © Bettmann/© Cor bis] 


Operations on Vectors in R n 

Our next goal is to define useful operations on vectors in R n . These operations will all be natural extensions 
of the familiar operations on vectors in r} and R-'. We will denote a vector y in R n using the notation 

V= (vi,v 2 .v„) 

and we will call 0 = (0, 0,..., 0) the zero vector. 

We noted earlier that in r} and R-' two vectors are equivalent (equal) if and only if their corresponding 
components are the same. Thus, we make the following definition. 


DEFINITION 2 

Vectors v = (v i , v 2 ,..v M ) and w = (viq, w 2 , ..., in R n are said to be equivalent (also called 

equal) if 

Vl = W\, v 2 = w 2 ,.... v„ = w„ 

We indicate this by writing v = w. 


EXAMPLE 2 Equality of Vectors 

(a,b,c,d) = ( 1, -4,2,7) 
if and only if a = 1, b = — 4, c = 2, and d = l- 


Our next objective is to define the operations of addition, subtraction, and scalar multiplication for vectors in 
R n . To motivate these ideas, we will consider how these operations can be performed on vectors in f:/ using 
components. By studying Figure 3.1.13 you should be able to deduce that if v = (vj, v 2 ) and w= (>tq, w 2 ). 
then 


v + w=(vi+wj, v 2 + w 2 ) 


( 6 ) 


kv= (jfcvj, tv 2 ) 


(7) 


In particular, it follows from 7 that 


and hence that 


-v= (-l)v= (-VJ, — v 2 ) 


W — V = W + ( —v) = (w 1 — v 1 , W 2 — v 2 ) 



Motivated by Formulas 6-9, we make the following definition. 


( 8 ) 

(9) 


1 


DEFINITION 3 

If v = (vj, v 2 ,v„) and w= (wj, w 2 , w M ) are vectors in R n , and if k is any scalar, then we 
define 


V + W= Ol +W1, v 2 + w 2 , ...V„ + W M ) 

(10) 

tv= (/tvi, kv2, 

(11) 

1 

< 

II 

T 

< 

1 

to 

1 

< 

(12) 

= w+ (-v) = (wi -V lf W 2 -V 2 ,...W„-V„) 

(13) 
















In words, vectors are added (or subtracted) by 
adding (or subtracting) their corresponding 
components, and a vector is multiplied by a 
scalar by multiplying each component by that 
scalar. 

EXAMPLE 3 Algebraic Operations Using Components 

Ifv=(l, — 3, 2) and w= (4, 2, 1), then 

v + w= (5, - 1, 3), 2v=(2, -6,4) 

—w=(-4, - 2-1) v—w=v+(—w) = (-3, -5, 1) 


The following theorem summarizes the most important properties of vector operations. 


THEOREM 3.1.1 

Ifu, v, and w are vectors in R n , and if k and m are scalars, then: 

(a) u + v = v + u 

(b) (u + v) +w = u+ (v + w) 

( c j u+0=0+u=u 

(d) u + ( - u) = 0 
f e j fc(u + v) = An + krv 
(f) (k 4- w)u = £u + mu 
fgj k(mn) = (km)u 

(h) l« = u 


We will prove part ( b ) and leave some of the other proofs as exercises. 

(b) Let u = (u\, U2, .... u n ), v = (vi, v 2 ,.... v M ), and w= (m>i, w 2 ,w„). Then 

(u + v) +w = {{u\,u 2 , ...,u n ) + (vi, v 2 ,...,v„)) + (wi, M>2 .W„) 

= Oi +vi,«2+V2 ,«m + v m ) + Oi, W2,.... [Vector addition] 

= ((«i + vi) 4 - wi, («2 +V2) +W2,.... (u n + v„) +w M ) [Vector addition] 
= Oi + (vi +wi),«2 + O2+ W 2 ), “m+ ( v m + w m)) [Regroup] 

= (« 1,«2 . u n) + ( V 1 +W 1 , V 2 + M '2 _v„ + w„) [Vector addition] 

= u + (v T w) 


The following additional properties of vectors in R n can be deduced easily by expressing the vectors in terms 
of components (verify). 


THEOREM 3.1.2 

If v is a vector in R n and k is a scalar, then: 

(a) 0v = 0 

(b) *0 = 0 

(c) (- l)v= -v 


Calculating Without Components 

One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow calculations to be performed 
without expressing the vectors in terms of components. For example, suppose that x, a, and b are vectors in 
R n , and we want to solve the vector equation x | a = b for the vector x without using components. We could 
proceed as follows: 

x + a = b [Given] 

(x + a) + ( — a) = b + ( — a) Add the negative of a to both sides 

x + (a + ( — a)) = b — a Part ( b ) of Theorem 3.1.1 

x + 0 = b — a Part ( d) of Theorem 3.1.1 

x = b — a Part (c) of Theorem 3.1.1 

While this method is obviously more cumbersome than computing with components in R n , it will become 
important later in the text where we will encounter more general kinds of vectors. 


Linear Combinations 

Addition, subtraction, and scalar multiplication are frequently used in combination to form new vectors. For 
example, if vi, V 2 , and V 3 are vectors in R n , then the vectors 

u = 2vj 4* 3 v 2 + V 3 and w = 7vj — 6 V 2 + 8 V 3 
are formed in this way. In general, we make the following definition. 


DEFINITION 4 

If iv is a vector in R n , then w is said to be a linear combination of the vectors vj, V 2 ,..v r in R n if it 





can be expressed in the form 


w = *ivi +£ 2 V2+ —+ *rv r (14) 

where kj, ...,k r are scalars. These scalars are called the coefficients of the linear combination. In 
the case where r = 1 , Formula 14 becomes w = so that a linear combination of a single vector 
is just a scalar muliple of that vector. 


Note that this definition of a linear combination 
is consistent with that given in the context of 
matrices (see Definition 6 in Section 1.3). 


Application of Linear Combinations to Color Models 

Colors on computer monitors are commonly based on what is called the RGB color model. Colors in 
this system are created by adding together percentages of the primary colors red (R), green (G), and 
blue (B). One way to do this is to identify the primary colors with the vectors 

r= (1, 0, 0) (pure red), 
g= (0,1,0) (pure green), 
b = (0, 0, 1) (pure blue) 

in f-' and to create all other colors by forming linear combinations of r, g, and b using coefficients 
between 0 and 1, inclusive; these coefficients represent the percentage of each pure color in the mix. 
The set of all such color vectors is called RGB space or the RGB color cube (Figure 3.1.14). Thus, 
each color vector c in this cube is expressible as a linear combination of the form 

c = £ir + & 2 g + & 3 b 

= * 1 ( 1 , 0 , 0 )+* 2 ( 0 , 1 , 0 )+* 3 ( 0 , 0 , 1 ) 

= (*1» * 2 > * 3 ) 

where 0 < < 1. As indicated in the figure, the corners of the cube represent the pure primary colors 

together with the colors black, white, magenta, cyan, and yellow. The vectors along the diagonal 
running from black to white correspond to shades of gray. 


Blue 

( 0 , 0 . 1 ), 


Magenta 

( 1 . 0 . 1 ) 

Black 

( 0 , 0 . 0 ) 

Red & 

( 1 . 0 , 0 ) 


/ 


/ 




Cyan 

( 0 . 1 . 1 ) 

White 

( 1 . 1 . 1 ) 

Green 

(0. 1,0) 


Yellow 

( 1 . 1 , 0 ) 


Figure 3.1.14 


Alternative Notations for Vectors 


Up to now we have been writing vectors in R n using the notation 


v=(vi,v 2 ,..., v„) 


(15) 


We call this the comma-delimited form. However, since a vector in pj 1 is just a list of its n components in a 
specific order, any notation that displays those components in the correct order is a valid way of representing 
the vector. For example, the vector in 15 can be written as 


v= [vj v 2 ...v„] 


(16) 


which is called row-matrix form, or as 



(17) 


which is called column-matrix form. The choice of notation is often a matter of taste or convenience, but 
sometimes the nature of a problem will suggest a preferred notation. Notations 15, 16, and 17 will all be used 
at various places in this text. 


Concept Review 

Geometric vector 
Direction 
Length 
Initial point 
Terminal point 
Equivalent vectors 












Zero vector 

Vector addition: parallelogram rule and triangle rule 

Vector subtraction 

Negative of a vector 

Scalar multiplication 

Collinear (i.e., parallel) vectors 

Components of a vector 

Coordinates of a point 

ft-tuple 

n -space 

Vector operations in ^-space: addition, subtraction, scalar multiplication 
Linear combination of vectors 

Skills 

Perform geometric operations on vectors: addition, subtraction, and scalar multiplication. 
Perform algebraic operations on vectors: addition, subtraction, and scalar multiplication. 
Determine whether two vectors are equivalent. 

Determine whether two vectors are collinear. 

Sketch vectors whose initial and terminal points are given. 

Find components of a vector whose initial and terminal points are given. 

Prove basic algebraic properties of vectors (Theorems 3.1.1 and 3.1.2). 


Exercise Set 3.1 

In Exercises 1-2, draw a coordinate system (as in Figure 3.1.10) and locate the points whose coordinates are 


given. 


L (a) 

(3, 4, 5) 

(b) 

(-3, 4, 5) 

(c) 

(3, -4, 5) 

(d) 

(3, 4, -5) 

(e) 

(-3, -4, 5) 

(f) 

(-3, 4, -5) 


Answer: 


(a) 


(b) 


(c) 


(d) 


(e) 


(f) 



d .—/r 

(3 -H- 5 >{ = y 


a *• y 

u i > n » 

-—A 






(3,4,-5) 


3, -4.5)j 


if-' 

3 

-i 

-i * 

. h.T 

& 


#3^ 


:i (j-3.4, -5) 


2 - (a) (0,3-3) 

(b) (3-3,0) 

(c) (-3,0,0) 

(d) (3, 0, 3) 

(e) (0,0,-3) 

(f) (0,3,0) 

In Exercises 3-4, sketch the following vectors with the initial points located at the origin. 

3 - (a) vi = (3, 6 ) 

(b) v 2 = (-4, - 8 ) 

(c) V 3 = (- 4, -3) 














(d) v 4= (3,4, 5) 

(e) v 5 = (3, 3, 0) 

(f) v 6 = (-l,0, 2) 


Answer: 



(a) A 

y 

— 

i 

i 
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i 

(b) 

( 

i y X 
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i 111 
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(c) 

i 
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i 

y 

i x 

1 \.±±± m 

* _ 

—► 

(d) 

i z 



—? 


(■ 

■=—f! 


Jf 


u -l 

(e) 

i z 



1 L 

i > 


r ► 

/ 

(f) A 

Z 


4 

i 

% , 



4 - (a) vi = (5, - 4) 

(b) v 2 = (3, 0) 

(c) V 3 = (0, - 7) 

(d) v 4 = (0, 0, -3) 

(e) v 5 = (0,4, - 1) 















(f) v 6 = (2, 2, 2) 

In Exercises 5-6, sketch the following vectors with the initial points located at the origin. 


5- (a) Pi (4, 8), P 2 (3,7) 

(b) Pi(3, -5), P 2 (-4,-7) 

(C ) Pi(3, -7.2). P 2 ( — 2. 5, —4) 

Answer: 


(a) 

(b) 

(c) 


I 1 1 1 1 

1 1 

i y X 

1 

j 







v- 


iCl 


111111 


V 

L- 




6. (a) Pi (-5,0), P 2 ( — 3, 1) 

(b)^l(O.O), P 2 (3,4) 

(C ) Pi(-1,0,2), P 2 (0, -1,0) 

(d) PK2.2.2), P 2 (0, 0, 0) 

In Exercises 7-8, find the components of the vector 


7- (a) Pi(3,5), P 2 (2, 8) 

(b) Pi (5, -2,1), P 2 (2,4, 2) 


Answer: 


(a) P\P 2 = (— 1 , 3) 

(b) P^P 2 = (-3,6,1) 

8. (a) Pi(-6,2), P 2 ( —4, — 1) 

(b) Pi (0,0,0), P 2 (-1,6,1) 

( a ) Find the termin a l point of the vector th a t is equiv a lent to u = ( 1, 2) and whose initial point is A( 1 , 1 ) 















(b) Find the initial point of the vector that is equivalent to u = (1, 1, 3) and whose terminal point is 

B(- 1, -1,2). 

Answer: 

(a) The terminal point is 5(2, 3). 

(b) The initial point is A(— 2, —2, —1). 

(a) Find the initial point of the vector that is equivalent to u = (1, 2) and whose terminal point is 5(2, 0) 

(b) Find the terminal point of the vector that is equivalent to u = (1, 1, 3) and whose initial point is 
AO, 2,0). 

11. Find a nonzero vector u with terminal point Q( 3, 0, — 5) such that 

(a) u has the same direction asv=(4, —2, —1). 

(b) u is oppositely directed to v = (4, — 2, — 1). 

Answer: 

(a) u = ( —1, 2, — 4) is one possible answer. 

(b) u = (7, — 2, — 6) is one possible answer. 

12. Find a nonzero vector u with initial point P( — 1,3, — 5) such that 

(a) u has the same direction asv=(6,7, —3). 

(b) u is oppositely directed tov=(6,7, —3). 

13. Let u = (4, — 1), v = (0, 5), and w= ( — 3, — 3). Find the components of 

(a) u + w 

(b) v - 3u 
( C ) 2(u— 5w) 

(cl) 3v — 2(u + 2w) 

( e ) —3(w—2u + v) 

(f) ( — 2u — v) — 5(v + 3w) 

Answer: 

(a) u+w= (1, -4) 

(b) v-3u= (-12, 8) 

(c) 2(u — 5w) = (38, 28) 

(d) 3v - 2(u + 2w) = (4, 29) 

(e) —3(w— 2u + v) = (33, -12) 

(f) (—2u —v) — 5(v + 3w) = (37, 17) 

14. Let u = ( — 3, 1, 2) , v = (4, 0, — 8 ) , and w = ( 6 , — 1, — 4) . Find the components of 



(a) v-w 

(b) 6u + 2v 

(c) -v + u 

(d) 5(v —4u) 

(e) —3(v — 8w) 

(f) (2u - 7w) - (8v + u) 

15. Let u = ( — 3, 2, 1, 0), v= (4, 7, — 3, 2), and w = (5, — 2, 8, 1). Find the components of 

(a) v-w 

(b) 2u + 7v 

( c ) -u + (v - 4w) 

(d) 6(u-3v) 

(e) -v-w 

(f) (6v — w) — (4u + v) 

Answer: 

(a) (-1.9, -11,1) 

(b) (22,53, - 19,14) 

(c) (-13,13, -36, -2) 

(d) (-90, - 114,60, -36) 

(e) (-9, -5, -5, -3) 

(f) (27,29, -27,9) 

16. Let u, v, and w be the vectors in Exercise 15. Find the vector x that satisfies 5x — 2v = 2 (w — 5x). 

17. Let u = (5, — 1, 0, 3, — 3), v= ( — 1, — 1, 7, 2, 0), and w= ( — 4, 2, — 3, — 5, 2). Find the 
components of 

(a) w-u 

(b) 2v + 3u 

(c) —w-F 3(v —u) 

(d) 5( — v + 4u —w) 

( e ) -2(3w + v) + (2u + w) 

(1) (w — 5v 4- 2u) 4- v 

Answer: 

( a ) w-u= (-9, 3, -3, -8,5) 

(b) 2v + 3u = (13, -5, 14,13, -9) 

(c) —w+ 3(v —u) = (—14, -2,24,2,7) 

(d) 5(—v + 4u —w) = (125, -25, -20,75, -70) 



(e) — 2(3w + v) + (2u4w) = (32, 

W ^(w-5v + 2u)+v= ||, |, 

18. Letu= (1, 2, -3,5,0), v=(0,4, 

(a) v + w 

(b) 3(2u-v) 

(c) (3u - v) - (2u + 4w) 

19. Let u = ( - 3, 1, 2, 4, 4), v = (4, 0, 
of 

(a) v-w 

(b) 6u + 2v 

(c) (2u - 7w) - (8v + u) 

Answer: 

(a) v-w= (-2, 1, -4, -2,7) 

(b ) 6u + 2v = ( — 10, 6, -4,26,28) 

(c) (2u - 7w) - (8v + u) = (-77, 8, 94, - 25, 23) 

20. Let u, v, and w be the vectors in Exercise 18. Find the components of the vector x that satisfies the 
equation 3u 4 v — 2w = 3x 4 2w- 

21. Let u, v, and w be the vectors in Exercise 19. Find the components of the vector x that satisfies the 
equation 2u - v 4 x = 7x 4 w- 

Answer: 

v= /_ 8 18 2m 

\ 3’ 2’ 3’ 3’ 6 } 

22. For what value(s) of t, if any, is the given vector parallel to u = (4, — 1)? 

(a) (8f, -2) 

(b) 

(c) (l,* 2 ) 

23. Which of the following vectors in are parallel to u = ( — 2, 1,0,3,5, 1)? 

(a) (4,2,0,6,10,2) 

(b) (4, -2,0, -6, -10, -2) 

(c) (0, 0, 0, 0, 0, 0) 


-10,1,27, -16) 



— 1, 1, 2), andw= (7, 1, —4, —2, 3). Find the components of 


— 8, 1, 2) , and w= (6, — 1, —4, 3, — 5). Find the components 


Answer: 


(a) Not parallel 

(b) Parallel 



(c) Parallel 


24. Let u = (2, 1, 0, 1, — 1) and v = ( — 2, 3, 1, 0, 2) . Find scalars a and b so that 
cm -F bv = ( — 8 , 8 , 3, —1,7). 

25. Letu=(l, — 1, 3, 5) and v = (2, 1, 0, — 3). Find scalars a and 6 so that cm 4 -&v= (1, —4,9,18). 
Answer: 

a = 3, b = — 1 

26. Find all scalars c i, c 2 , and c 3 such that 

ei(l. 2, 0) + c 2 (2, 1.1) +c 3 (0, 3, 1) = (0, 0, 0) 

27. Find all scalars Ci, c 2 , and c 3 such that 

ci(h - 1. 0) + c 2 (3, 2, 1) + c 3 (0, 1,4) = (- 1, 1, 19) 


Answer: 

ci = 2, c 2 = - 1, C 3 = 5 

28. Find all scalars c 1 , c 2 , and c 3 such that 

ci( - 1, 0, 2) +c 2 (2, 2 , - 2) + c 3 (l, -2, 1) = (- 6, 12,4) 

29. Let ui = ( — 1, 3, 2, 0), u 2 = (2, 0, 4, — 1), u 3 = (7, 1, 1,4), and 114= (6, 3, 1,2). Find scalars c\, 
c 2 , c 2 , and C 4 such that cjuj -Fc 2 u 2 + c 3 u 3 +C 4 U 4 = (0, 5, 6 , —3). 


Answer: 

Cl = 1, c 2 = 1, c 3 = - 1, c 4 = 1 

30. Show that there do not exist scalars ci, c 2 , and c 3 such that 

ci(l, 0, 1, 0) +c 2 (l, 0, - 2, 1) +c 3 (2, 0, 1, 2) = (1, -2, 2, 3) 

31. Show that there do not exist scalars c\,c 2 , and c 3 such that 

ci(- 2, 9, 6) + c 2 ( - 3, 2,1) + c 3 (l, 7, 5) = (0, 5,4) 

32. Consider Figure 3.1.12. Discuss a geometric interpretation of the vector 

u = dFl+j(dp 2 ~dp l 'j 

33. Let P be the point (2, 3, — 2) and Q the point (7, — 4, 1). 

(a) Find the midpoint of the line segment connecting P and Q. 

(b) Find the point on the line segment connecting P and Q that is ^ of the way from P to Q. 


Answer: 


<a) (§, 

<b) (f • 


-I -1) 
2 ’ 2 ) 



34. Let P be the point (1, 3, 7). If the point (4, 0, — 6 ) is the midpoint of the line segment connecting P and 
Q, what is Ql 

35. Prove parts ( a ), (c), and ( d) of Theorem 3.1.1. 

36. Prove parts {e)-{h) of Theorem 3.1.1. 

37. Prove parts (a)-(c) of Theorem 3.1.2. 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Two equivalent vectors must have the same initial point. 

Answer: 

False 

(b) The vectors (a, b) and (a, b, 0) are equivalent. 

Answer: 

False 

(c) If A: is a scalar and v is a vector, then v and kv are parallel if and only if k > 0- 
Answer: 

False 

(d) The vectors v 4- (u 4 - w) and (w + v) 4 - u are the same. 

Answer: 

True 

(e) If u 4 . v = u + w, then v = w. 

Answer: 

True 

(f) If a and b are scalars such that au | bv = 0, then u and v are parallel vectors. 

Answer: 

False 

(g) Collinear vectors with the same length are equal. 

Answer: 

False 

(h) If (a, b, c) 4- 0, y, z ) = (x, y, z), then (a, b, c ) must be the zero vector. 


Answer: 


True 

(i) If k and m are scalars and u and v are vectors, then 

(k 4- m) (u + v) = £u + mv 


Answer: 

False 

(j) If the vectors v and w are given, then the vector equation 

3(2v-x) = 5x —4w+v 

can be solved for x. 

Answer: 

True 

(k) The linear combinations a jvj 4- &2 V 2 and + &2 V 2 can on ly be equal if a\ — b\ and aj = bj- 
Answer: 

False 
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3-2 Norm, Dot Product, and Distance in R n 

In this section we will be concerned with the notions of length and distance as they relate to vectors. We will 
first discuss these ideas in R 2 and R-' and then extend them algebraically to R n . 


Norm of a Vector 

In this text we will denote the length of a vector v by the symbol || v|| , which is read as the norm of v, the 
length of v, or the magnitude of v (the term “norm” being a common mathematical synonym for length). As 
suggested in Figure 3.2.1a, it follows from the Theorem of Pythagoras that the norm of a vector (vj, V 2 ) in r} 
is 


IMI = /vf+vf (1) 

Similarly, for a vector (vj, V 2 , V 3 ) in /?-', it follows from Figure 3.2.16 and two applications of the Theorem of 
Pythagoras that 

||v|| 2 = (OR) 2 +(RP) 2 = (00 2 + ( QR ) 2 + (RP) 2 = vj + v| + v] 

and hence that 

IMI = /v 2 + v 2 + v 2 (2) 

Motivated by the pattern of Formulas 1 and 2 we make the following definition. 


DEFINITION 1 

If v = (vi, V 2 ,.... v M ) is a vector in R n , then the norm of v (also called the length of v or the 
magnitude of v) is denoted by ||v||, and is defined by the formula 

IMI = /v 2 + v 2 + v 2 + ... + v^ (3) 


EXAMPLE 1 Calculating Norms 

It follows from Formula 2 that the norm of the vector v = ( — 3, 2, 1) in /?-' : is 

IMI = ^ (—3) 2 + 2 2 + l 2 = /l4 

and it follows from Formula 3 that the norm of the vector v=(2, — 1,3, — 5) in/? 4 is 

IMI = /2 2 + (-l) 2 *3 2 + (-5) 2 = {39 
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Figure 3.2.1 


Our first theorem in this section will generalize to R n the following three familiar facts about vectors in g} and 


R 


3 . 


Distances are nonnegative. 

The zero vector is the only vector of length zero. 

Multiplying a vector by a scalar multiplies its length by the absolute value of that scalar. 

It is important to recognize that just because these results hold in R z and does not guarantee that they hold 
in R n —their validity in R n must be proved using algebraic properties of n-tuples. 


THEOREM 3.2.1 

If v is a vector in R n , and if k is any scalar, then: 

(a) l|v|| > 0 

(b) 11 v 11 = 0 if and only if y = Q 

(c) IMI = |*|IMI 

We will prove part (c) and leave ( a ) and ( b ) as exercises. 

(c) If v = (vi,V 2 .v„),then£v= (kvi, kv 2 ,.... kv„), so 








Unit Vectors 


IIMI = /(*vi ) 2 + (*v 2 ) 2 + • • • +(*v«) 2 
= 1 /(* J )(v? + v| + • • ■ +vj) 

= + • • • -F v 2 

= |*|IMI 


A vector of norm 1 is called a unit vector. Such vectors are useful for specifying a direction when length is not 
relevant to the problem at hand. You can obtain a unit vector in a desired direction by choosing any nonzero 
vector v in that direction and multiplying v by the reciprocal of its length. For example, if v is a vector of 


length 2 in p} or p^, then -i-v is a unit vector in the same direction as v. More generally, if v is any nonzero 


vector in R n , then 


defines a unit vector that is in the same direction as v. We can confirm that 4 is a unit vector by applying part 
(c) of Theorem 3.2.1 with k= \ ! ||v|| to obtain 

Hull = IIMI = |*|l|v|| =*IMI = -jj^jflMI = l 

The process of multiplying a nonzero vector by the reciprocal of its length to obtain a unit vector is called 
normalizing v. 

WARNING 


Sometimes you will see Formula 4 expressed as 



This is just a more compact way of writing that 
formula and is not intended to convey that v is 
being divided by || v||. 


EXAMPLE 2 Normalizing a Vector 

Find the unit vector u that has the same direction as v = (2, 2, — 1). 
The vector v has length 

IMI = ^2 2 + 2 2 + ( — 1) 2 = 3 


u = y(2, 2, 



1 

3 


) 


Thus, from 4 









As a check, you may want to confirm that ||u|| = 1 . 


The Standard Unit Vectors 


When a rectangular coordinate system is introduced in R 2 or R-', the unit vectors in the positive directions of 
the coordinate axes are called the standard unit vectors. In pp these vectors are denoted by 

i= (1,0) and j=(0,l) 


and in pp by 


i= (1.0.0). j= (0.1.0), and k= (0.0,1) 

(Figure 3.2.2). Every vector v = (vj, V2) in and every vector v = (v[, V2, V3) in p~' can be expressed as a 
linear combination of standard unit vectors by writing 


v = Oi, v 2> =vi(l, 0 ) +V2(0, 1 ) =vii + V2j 


(5) 


v = (v 1 .v 2 .v 3 ) =vi(l, 0, 0) 4- v 2 (0, 1, 0) +V 3 ( 0 , 0, 1) =V!i + V 2 ] + V 3 k ( 6 ) 

Moreover, we can generalize these formulas to R n by defining the standard unit vectors in R n to be 

ei = (1, 0, 0..... 0), e 2 = (0. 1. 0.0). e„ = (0. 0, 0..... 1) (7) 

in which case every vector v=(vi,V2,..., v M ) in R n can be expressed as 

v=(vi,v 2 .v„) = vje! + v 2 e 2 + ... +v K e„ ( 8 ) 

EXAMPLE 3 Linear Combinations of Standard Unit Vectors 

(2, — 3, 4) = 2i — 3j + 4k 

(7, 3, — 4, 5) = 7ei + 3e2 — 4e3 4- 5e4 


f 

(0. I) 
j 

n 

t c 

( 0 , 0 . 1 ) 

k 

J _ y 

I (0.1.0) 

. ( 1 . 0 , 0 ) 

V 

(b) 

Figure 3.2.2 


i (1.0) 
(«) 


Distance in R n 

If P\ and P'i are points in or then the length of the vector p^p\ is equal to the distance d between the 
two points (Figure 3.2.3). Specifically, if Pj (x\, y \) and P'i(p2> > r ?) are points in fip, then Formula 4 of 
Section 3.1 implies that 

i = WPJi II = |/02-*i) 2 + 02-;'i) 2 < 9 > 

This is the familiar distance formula from analytic geometry. Similarly, the distance between the points 
Pl(*l, y\,z\) and P2(*2>y2,Z2) in 3-space is 

d ( u, v) = H^i^ll = /(x2-xi) 2 + 0 ; 2-3 ; l) 2 + ( z 2“ z l) 2 ( 10 ) 

Motivated by Formulas 9 and 10, we make the following definition. 


DEFINITION 2 

If u = (u\, U2 , - ti n ) an d v = (vi, V 2 ,v„) are points in R n , then we denote the distance between 
u and v by d (u, v) and define it to be 


+ • • • + (tin “ v m ) 


d(u, v) = ||u-v|| = ^(«i - Vl ) 2 + (« 2 -v 2 ) 2 


( 11 ) 








4=Ml 


Figure 3.2.3 


We noted in the previous section that n-tuples 
can be viewed either as vectors or points in R n . 
In Definition 2 we chose to describe them as 
points, as that seemed the more natural 
interpretation. 


EXAMPLE 4 Calculating Distance in R n 

if 

u= (1, 3, — 2, 7) and v=(0,7,2,2) 
then the distance between u and v is 

d( u, v) = /(I - 0) 2 + (3 - 7) 2 + (-2 - 2) 2 + (7 - 2) 2 = ^58 


Dot Product 

Our next objective is to define a useful multiplication operation on vectors in p} and p-' and then extend that 
operation to P n . To do this we will first need to define exactly what we mean by the “angle” between two 
vectors in p} or p-'. For this purpose, let u and v be nonzero vectors in p/ or p-’ that have been positioned so 
that their initial points coincide. We define the angle between u and v to be the angle 0 determined by u and v 
that satisfies the inequalities 0 < 0 < ir (Figure 3.2.4). 


DEFINITION 3 

If u and v are nonzero vectors in p} or p*, and if 0 is the angle between u and v, then the dot product 
(also called the Euclidean inner product ) of u and v is denoted by u • v and is defined as 

u • v= ||u||||v||cos0 (12) 


If u = 0 or v = 0> then we define u • v to be 0. 




u 


II 


6 



v 





V 


The angle 0 between u and v satisfies 0 < 0 < tt . 


Figure 3.2.4 


The sign of the dot product reveals information about the angle 0 that we can obtain by rewriting Formula 12 
as 


cos 9 = 


IMIIMI 


(13) 


Since 0 < 0 < ir, it follows from Formula 13 and properties of the cosine function studied in trigonometry that 

• 9 is acute if u • v > 0- 

• S is obtuse if u • v < 0- 

• 9 = tr/ 2 ifu • v = 0- 


EXAMPLE 5 Dot Product 


Find the dot product of the vectors shown in Figure 3.2.5. 

a* 


( 0 . 2 , 2 ) 

V 


(O.O, I) 



Figure 3.2.5 


The lengths of the vectors are 

Nil = 1 and ||v|| = /8 = 2/2 
and the cosine of the angle 0 between them is 

cos ^45° J = 1/^2 

Thus, it follows from Formula 12 that 







u • v = Hull IMIcos 9 = ( 1 ) (2/2) (1 / ft) = 2 


EXAMPLE 6 A Geometry Problem Solved Using Dot Product 


Find the angle between a diagonal of a cube and one of its edges. 


Let A: be the length of an edge and introduce a coordinate system as shown in Figure 3.2.6. 
If we let ui = (£, 0, 0), U 2 = (0, k, 0), and U 3 = (0, 0, k), then the vector 

d = (£, k, £) =m +U2 + U3 

is a diagonal of the cube. It follows from Formula 13 that the angle 0 between d and the edge uj 
satisfies 

a UI • d k 1 _L 

llu.lllldll ft 

With the help of a calculator we obtain 

9 = cos -1 54.74° 

t (O, 0. k) 

(it. k. k) 

u, y 

-► 

(0. k. 0) 

x Jf (it. 0.0) 

Figure 3.2.6 



Note that the angle 0 obtained in Example 6 
does not involve k. Why was this to be 
expected? 


Component Form of the Dot Product 

For computational purposes it is desirable to have a formula that expresses the dot product of two vectors in 
terms of components. We will derive such a formula for vectors in 3-space; the derivation for vectors in 
2 -space is similar. 






Let u = (u\, U2, ^3) and v = (y\, V2, V3) be two nonzero vectors. If, as shown in Figure 3.2.7, 0 is the angle 
between u and v, then the law of cosines yields 



(14) 



Josiah Willard Gibbs (1839-1903) 


The dot product notation was first introduced by the American physicist and 
mathematician J. Willard Gibbs in a pamphlet distributed to his students at Yale University in the 
1880s. The product was originally written on the baseline, rather than centered as today, and was 
referred to as the direct product. Gibbs’s pamphlet was eventually incorporated into a book entitled 
Vector Analysis that was published in 1901 and coauthored with one of his students. Gibbs made major 
contributions to the fields of thermodynamics and electromagnetic theory and is generally regarded as 
the greatest American physicist of the nineteenth century. 

[Image: The Granger Collection, New York] 

Since PQ = v - u, we can rewrite 14 as 


Hull IMIcos e = i (||u || 2 + IMI 2 - ||v - u|| 2 ) 


or 



Substituting 


Ml 2 = «?+«£+ 4 ||v || 2 = vj -h vj + V3 


and 


l|v-u|| 2 = (vi -ai ) 2 + (v 2 -« 2) 2 + (V3“«3 ) 2 


we obtain, after simplifying, 


u - v = + « 2 V 2 4- & 3 V 3 


(15) 


Although we derived Formula 15 and its 
2-space companion under the assumption that u 
and v are nonzero, it turned out that these 
formulas are also applicable if u = 0 or v = 0 
(verify). 

The companion formula for vectors in 2-space is 

u- v = «ivi 4-&2 v 2 (16) 

Motivated by the pattern in Formulas 15 and 16, we make the following definition. 


DEFINITION 4 

If u = (u\, U2 , - u n ) and v = (v\, V 2 ,v M ) are vectors in then the dot product (also called the 

Euclidean inner product) of u and v is denoted by u - y and is defined by 

U-v = «ivi +U2V2+-.. + u n v n (17) 


In words, to calculate the dot product 
(Euclidean inner product) multiply 
corresponding components and add the 
resulting products. 


EXAMPLE 7 Calculating Dot Products Using Components 

(a) Use Formula 15 to compute the dot product of the vectors u and v in Example 5. 
Calculate u ■ v f° r the following vectors in 

u= ( — 1, 3, 5,7), v=(-3, -4,1,0) 


Solution 

) The component forms of the vectors are u = (0, 0, 1) and v = (0, 2, 2). Thus, 

u • v= (0) (0) + (0) (2) + (1) (2) = 2 
which agrees with the result obtained geometrically in Example 5. 

«■▼=( — D( — 3) + (3)( — 4) + (5)(l) + (7)(0)= — 4 


(b) 


f\u { . Uy Uj) 


U 


V 


CA<V r 2’ f, 3) 


8 


y 


x 


/ 


Figure 3.2.7 


Algebraic Properties of the Dot Product 


In the special case where u = v in Definition 4, we obtain the relationship 


v v = vJ+v| + ... + v^ = ||v|| 2 


(18) 


This yields the following formula for expressing the length of a vector in terms of a dot product: 



(19) 


Dot products have many of the same algebraic properties as products of real numbers. 

THEOREM 3.2.2 


If u, v, and w are vectors in R n , and if k is a scalar, then: 


(a) u • v = v ■ u [ Symmetry property] 

(b) u • (v + w) = u • v + u • w [Distibutive property] 

(c) k(u • v) = (ku) • v [Homogeneity property] 

(d) v • v > 0 and v • v = 0 if and only if v = 0 [Positivity property] 

We will prove parts (c) and ( d) and leave the other proofs as exercises. 

(c) Let u = (u\,U2,...,u n ) and v = (vi,V 2 .v„).Then 


£(u-v) =£(aivi +W2V2 + .~ + «„v„) 

= (kui)vi + (ku2>2 + ...+ (ku„)v„ = (An) • v 


Proof (d) The result follows from parts (a) and ( b ) of Theorem 3.2.1 and the fact that 





V • V = V 1 V 1 +v 2 v 2 + — + v„v M = v 2 +V2 + —+ v 2 = ||v|| 2 


The next theorem gives additional properties of dot products. The proofs can be obtained either by expressing 
the vectors in terms of components or by using the algebraic properties established in Theorem 3.2.2. 


THEOREM 3.2.3 

If u, v, and w are vectors in R n , and if A: is a scalar, then: 

(a) 0 • v = v • 0 = 0 

(b) (u + v) • w = u • w + v • w 
( c \ u • (v — w) = u • v — u • w 

(d) (u — v) • w=u • w — v ■ w 

(e) *(u • v) = u • ( kv ) 

We will show how Theorem 3.2.2 can be used to prove part ( b ) without breaking the vectors into components. 
The other proofs are left as exercises. 

Proof (b) 


(u + v) -w 


= v (u + v) 
= w • u + w- V 
= u • w +v • w 


[By symmetry] 
[By distributivity] 
[By symmetry] 


Formulas 18 and 19 together with Theorems 3.2.2 and 3.2.3 make it possible to manipulate expressions 
involving dot products using familiar algebraic techniques. 

EXAMPLE 8 Calculating with Dot Products 

(u — 2v) • (3u + 4v) = u • (3u + 4v) — 2v • (3u + 4v) 

= 3(u • u) +4(u • v) — 6(v • u) — 8(v • v) 

= 3||u|| 2 — 2(u • v) — 8||v|| 2 


Cauchy—Schwarz Inequality and Angles in R n 


Our next objective is to extend to R n the notion of “angle” between nonzero vectors u and v. We will do this 
by starting with the formula 


(20) 


9 = cos 1 f n U .|‘|| V n ) 

V IMIIMI / 

which we previously derived for nonzero vectors in r} and Rf. Since dot products and norms have been 
defined for vectors in R n , it would seem that this formula has all the ingredients to serve as a definition of the 
angle 0 between two vectors, u and v, in R n . However, there is a fly in the ointment, the problem being that the 
inverse cosine in Formula 20 is not defined unless its argument satisfies the inequalities 


-1 < 


IMIIMI 


< l 


( 21 ) 


Fortunately, these inequalities do hold for all nonzero vectors in R n as a result of the following fundamental 
result known as the Cauchy—Schwarz inequality. 


Cauchy—Schwarz Inequality 

Ifu = (u\, U2 ,.... u n ) and v = (vj, V 2 ,v M ) are vectors in R n , then 

|«-v|< IMIIMI 


or in terms of components 


u\v i +ti 2 V 2 + -.. + u„v„ 


< 


1 + u 2 +--• + 




1/2 


( V 1 + v 2 


+ - - - + V; 


") 


1/2 


( 22 ) 


(23) 


We will omit the proof of this theorem because later in the text we will prove a more general version of which 
this will be a special case. Our goal for now will be to use this theorem to prove that the inequalities in 21 hold 
for all nonzero vectors in R n . Once that is done we will have established all the results required to use Formula 
20 as our definition of the angle between nonzero vectors u and v in R n . 


To prove that the inequalities in 21 hold for all nonzero vectors in R n , divide both sides of Formula 22 by the 
product ||u|| ||v|| to obtain 


i " • v i 

IMIIMI 


<1 


or equivalently 


u • v 

IMIIMI 


< 1 


from which 21 follows. 











Hermann Amandus Schwarz (1843-1921) 



Viktor Yakovlevich Bunyakovsky (1804-1889) 


The Cauchy—Schwarz inequality is named in honor of the French mathematician 
Augustin Cauchy (see p. 109) and the German mathematician Hermann Schwarz. Variations of this 
inequality occur in many different settings and under various names. Depending on the context in 
which the inequality occurs, you may find it called Cauchy's inequality, the Schwarz inequality, or 
sometimes even the Bunyakovsky inequality, in recognition of the Russian mathematician who 
published his version of the inequality in 1859, about 25 years before Schwarz. 

[Images: wikipedia (Schwarz); wikipedia (Bunyakovsky)] 


Geometry in R n 

Earlier in this section we extended various concepts to R }} with the idea that familiar results that we can 
visualize in r} and r} might be valid in R n as well. Here are two fundamental theorems from plane geometry 
whose validity extends to R n \ 

The sum of the lengths of two side of a triangle is at least as large as the third (Figure 3.2.8). 

The shortest distance between two points is a straight line (Figure 3.2.9). 

The following theorem generalizes these theorems to R n . 




THEOREM 3.2.5 


If u, v, and w are vectors in R n , and if k is any scalar, then: 

( a ) ll u + v ll ^ INI + IMI [Triangle inequality for vectors] 

(h) d (u, v) < d (u, w) -I- d (w, v) [Triangle inequality for distances] 


Proof (a) 


||u*v|| 2 = 


< 

< 


(u + v) • (u + v) = (u • u) + 2(u • v) + (v • v) 
l|u|| 2 + 2(u • v) + ||v|| 2 

||u|| 2 + 2|u • v| + || v|| 2 «— Property of absolute value 

||u|| 2 + 2||u||||v|| + Il v l| 2 Cauchy — Schwarz inequality 

(INI + INI ) 2 


Proof (b) It follows from part (a) and Formula 11 that 


af(u,v) = ||u-v|| = ||(u-w) + (w-v)|| 

< ||u — w|| + ||w— v|| =<af(u, w) +<af(w, v) 


u 

ll u + v|| < ||u|| + ||v|| 


Figure 3.2.8 



V 


</(u. v) < d( u. w) -f ch w, v) 

Figure 3.2.9 

It is proved in plane geometry that for any parallelogram the sum of the squares of the diagonals is equal to the 
sum of the squares of the four sides (Figure 3.2.10). The following theorem generalizes that result to R”. 

Parallelogram Equation for Vectors 

If u and v are vectors in R n , then 

llu 4- V|| 2 + ||u - v|| 2 = 2 (||u|| 2 + ||v|| 2 ) (24) 


Proof 


l|u + v|| 2 + ||u-v|| 2 


= (u 4- v) • (u 4- v) 4= (u — v) • (u — v) 
= 2(u • u) + 2(v • v) 

= 2(||u|| 2 +||v|| 2 ) 



u 


Figure 3.2.10 


We could state and prove many more theorems from plane geometry that generalize to R n , but the ones already 
given should suffice to convince you that R n is not so different from r} and R-' even though we cannot 
visualize it directly. The next theorem establishes a fundamental relationship between the dot product and norm 
in*”. 




THEOREM 3.2.7 


If u and v are vectors in R n with the Euclidean inner product, then 

u • V = ^||u + vll 2 - i||u - v|| 2 


(25) 


Proof 

||u + v|| 2 = (u + v) • (u + v) = Hull 2 4- 2(u • v) + ||v|| 2 

||u v|| 2 = (u-v) • (u-v) = ||u|| 2 — 2(u-v) + ||v|| 2 

from which 25 follows by simple algebra. 

Note that Formula 25 expresses the dot product 
in terms of norms. 


Dot Products as Matrix Multiplication 

There are various ways to express the dot product of vectors using matrix notation. The formulas depend on 
whether the vectors are expressed as row matrices or column matrices. Here are the possibilities. 

If A is an n x n matrix and u and v are 1 matrices, then it follows from the first row in Table 1 and 
properties of the transpose that 

An • v = v r (Ai) = = (21 7 v) ' u = u • A T v 

u • Ax — (j4v) *' u = (vV)u = v r (^u) = A^n • v 

The resulting formulas 

.4u ■ v = u • (26) 

u-j4v = ^ 7 'u-v (27) 

provide an important link between multiplication by an « x « matrix A and multiplication by A J • 

EXAMPLE 9 Verifying That Au v = u ■ A T m 


Suppose that 


Then 


A = 


CO 

Csl 

I 

I 


-r 


-2' 

2 4 1 

, U = 

2 

, V = 

0 

-1 0 1 


4 


5 


from which we obtain 


An = 


A T r 


1 -2 3' 

r-r 


7' 

2 4 1 

2 

= 

10 

-1 0 1 

4 


5 

1 2 -1] 

'-2' 


'-7' 

-2 4 0 

0 

= 

4 

3 1 1 

5 


-1 


Auv =7(- 2)+ 10(0)+ 5(5) = 11 
u-A r v = (— 1)( — 7) + 2(4) + 4( — 1) = 11 

Thus, Ai • v = u • A 1 v as guaranteed by Formula 26. We leave it for you to verify that Formula 
27 also holds. 


Table 1 


Form 


Dot Product 


Example 


u a column matrix and 
v a column matrix 


T T 

u • V = u v = v u 


u = 


V = 


1 

-3 

5 

5 
4 
0 


u'v=[l -3 5] 


v'u=[5 4 0] 


5 

4 
0 

1 

-3 

5 






u a row matrix and v a 
column matrix 


T T 

u • V = UV = V u 


U=[l 

'5 

v= 4 
0 


-3 5] 


uv=[l -3 5] 


v T xi T =[5 4 0] 


5 

4 

0 

1 

-3 

5 






u a column matrix and 
v a row matrix 


T T 

u • v = vu = u V 


u = 


r 



f 

1- 

1 

1_ 


vu= [5 4 0] 

1 

[ 

U1 LkJ 

1_ 




v= [5 4 0] 


uV=[l -3 5] 


= -7 

















































Form 


Dot Product Example 


u a row matrix and v a 

T T 

u • V = uv = vu 

u=[l -3 5] 



'5' 


row matrix 


v = [5 4 0] 

uv = [ 1 -3 5] 

4 

= -7 






0 







f 





vu r = [5 4 0] 


-3 

= -7 






5 










A Dot Product View of Matrix Multiplication 


Dot products provide another way of thinking about matrix multiplication. Recall that if A = [ay ] is an m x r 
matrix and B = [by ] is an r x n matrix, then the iJth entry of AB is 

+ —+ «ir^rj 

which is the dot product of the /th row vector of A 

[<3jl i3j2 ... <%] 


and the /th column vector of B 


*1 / 
*2j 


Thus, if the row vectors of A are r i, r 2 ,.. r m and the column vectors of B are c \ , C 2 ,.. t n , then the matrix 
product AB can be expressed as 



r l ' 

ci 

r l ■ 

' c 2 

AB = 

r 2 ' 

Cl 

*2 ' 

' c 2 



• Cl 


• c 2 


r l • c„ 
r 2 • 

r m ‘ c « 


(28) 


Application of Dot Products to ISBN Numbers 

Although the system has recently changed, most books published in the last 25 years have been 
assigned a unique 10-digit number called an International Standard Book Number or ISBN. The first 
nine digits of this number are split into three groups—the first group representing the country or group 
of countries in which the book originates, the second identifying the publisher, and the third assigned to 
the book title itself. The tenth and final digit, called a check digit , is computed from the first nine digits 
and is used to ensure that an electronic transmission of the ISBN, say over the Internet, occurs without 
error. 

To explain how this is done, regard the first nine digits of the ISBN as a vector b in and let a be the 
















vector 


a=(l,2, 3,4, 5, 6,7, 8 , 9) 

Then the check digit c is computed using the following procedure: 

Form the dot product a ■ b- 

Divide a • b by 11, thereby producing a remainder c that is an integer between 0 and 10, inclusive. 
The check digit is taken to be c, with the proviso that c = 10 is written as X to avoid double digits. 

For example, the ISBN of the brief edition of Calculus , sixth edition, by Howard Anton is 

0 — 471 — 15307 — 9 

which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since 
a • b = (1, 2, 3, 4, 5, 6, 7, 8 , 9) • (0, 4, 7, 1, 1, 5, 3, 0, 7) = 152 

Dividing 152 by 11 produces a quotient of 13 and a remainder of 9, so the check digit is ^ = 9 - If an 
electronic order is placed for a book with a certain ISBN, then the warehouse can use the above 
procedure to verify that the check digit is consistent with the first nine digits, thereby reducing the 
possibility of a costly shipping error. 


Concept Review 

Norm (or length or magnitude) of a vector 

Unit vector 

Normalized vector 

Standard unit vectors 

Distance between points in R n 

Angle between two vectors in R n 

Dot product (or Euclidean inner product) of two vectors in R n 
Cauchy-Schwarz inequality 
Triangle inequality 
Parallelogram equation for vectors 

Skills 

Compute the norm of a vector in R n . 

Determine whether a given vector in R n is a unit vector. 

Normalize a nonzero vector in R n . 

Determine the distance between two vectors in R n . 

Compute the dot product of two vectors in R n . 

Compute the angle between two nonzero vectors in R n . 

Prove basic properties pertaining to norms and dot products (Theorems 3.2.1-3.2.3 and 3.2.5-3.2.7). 


Exercise Set 3.2 


In Exercises 1-2, find the norm of v, a unit vector that has the same direction as v, and a unit vector that is 
oppositely directed to v. 


L v = (4, - 3) 

(b) v = ( 2 , 2 , 2 ) 

(c) v= (1.0. 2. 1.3) 


Answer: 



2* (a) v=( —5,12) 

(b) v=(l, -1,2) 

(c) v = (— 2, 3, 3, — 1) 

In Exercises 3-4, evaluate the given expression with u = (2, — 2, 3), v = (1, — 3, 4), and 
w= (3, 6, -4). 

3 -(a) ll u + v ll 

(b) ll^ll + IMI 

(c) || — 2u + 2v|| 

(d) || 3u — 5v + w|| 


Answer: 


(a) ||u + v|| = /83 

(b) ||u|| + ||v|| = /l7 + /26 

(c) ||-2u + 2v|| = 2/3 

(d) || — 3u — 5v + w|| = ^466 

4 * (a) llu + vd w|| 

(b) ll u - v ll 

(c) H3 v||-3||v|| 










(d) INI - IMI 

In Exercises 5-6, evaluate the given expression with u = ( — 2, — 1, 4, 5), v = (3, 1, — 5, 7), and 

w= ( — 6, 2, 1, 1) 

5 * (a) ||3u-5v + w|| 

(b) l|3u||-5||v|| + ||w|| 

(c) II “ IMMI 

Answer: 

(a) || 3u - 5v 4 w|| = /2570 

(b) ||3u|| - 5||v|| 4 ||w|| = 3/46 - IO/ 2 T4 /42 

(c) || - ||u||v|| = 2/966 

6 - (a) ll u ll — 2||v|| — 3||w|| 

(b) INI + II - 2v|| 4 || - 3w|| 

(c) || ||u — v||w|| 

7. Let v = ( — 2, 3, 0, 6). Find all scalars k such that ||£v|| = 5. 


Answer: 



8. Let v = (1, 1, 2, — 3, 1). Find all scalars k such that ||£v|| = 4. 
In Exercises 9-10, find u • v, u • u, and v • v- 

9 * (a) u=(3, 1,4), v= (2, 2, -4) 

(b) u=(l, 1,4,6),v=(2, -2,3, -2) 

Answer: 

( a ) u • v = — 8, u • u = 26, v • v = 24 
u • v = 0, u • u = 54, v • v = 21 

10 -(a) u=(l, 1, -2,3), v=( —1,0,5, 1) 

(b) u= (2, -1,1,0, -2), v= (1,2, 2, 2,1) 

In Exercises 11-12, find the Euclidean distance between u and v. 

n *(a) u= (3, 3, 3), v = (1, 0,4) 

(b) u= (0, -2, - 1, 1), v = (-3, 2,4,4) 

(c) u= (3, -3, -2,0, -3,13,5), 

▼ =( — 4,1. -1,5, 0, -11,4) 



Answer: 


(a) 11 u - v|| = {\4 

(b) ||u - v|| = {59 

(c) ||u-v|| = /677 

12 -(a) u= (1, 2, -3,0), v= (5, 1, 2, -2) 

(b) u= (2, -1, -4, 1,0,6, -3,1), 
v=(-2, -1,0,3,7,2, -5,1) 

(c) u= (0, 1, 1, 1, 2), v= (2, 1, 0, - 1, 3) 

13. Find the cosine of the angle between the vectors in each part of Exercise 11, and then state whether the 
angle is acute, obtuse, or 90°. 

Answer: 

( a ) cos 0 = -J=J= . 0 j s acute 

0 3 ) cos 0— . Q 0 btuse 

<C) cos9=_ fiEfm ; 9isob,use 

14. Find the cosine of the angle between the vectors in each part of Exercise 12, and then state whether the 
angle is acute, obtuse, or 90°. 

15. Suppose that a vector a in the xy-plane has a length of 9 units and points in a direction that is 120° 
counterclockwise from the positive x-axis, and a vector b in that plane has a length of 5 units and points in 
the positive ^-direction. Find a ■ b- 

Answer: 

a-b=45-^ 

16. Suppose that a vector a in the xy-plane points in a direction that is 47° counterclockwise from the positive 
x-axis, and a vector b in that plane points in a direction that is 43° clockwise from the positive x-axis. What 
can you say about the value of a • b? 

In Exercises 17-18, determine whether the expression makes sense mathematically. If not, explain why. 

17 • (a) u • (v • w) 

(b) u • (v 4 w) 

(c) ll u ' v ll 

(d) (**▼)- Hull 


Answer: 



is a scalar. 


(a) u • (v • w) does not make sense because v ■ w 

(b) u • (v 4 w) makes sense. 

(c) ||u-v|| does not make sense because the quantity inside the norm is a scalar. 

(d) (u ■ v) 11 u 11 makes sense since the terms are both scalars. 

18- (a) IMI • IMI 

(b) (u-v)-w 

(c) (u • v) — £ 

(d) * • u 

19. Find a unit vector that has the same direction as the given vector. 

(a) (-4,-3) 

(b) (1»7) 

(c) (- 3,2, ,^3) 

(d) (1,2, 3,4, 5) 


Answer: 


(a) 

(b) 

(c) 

(d) 


(4 4 ) 


5 / 2 ’ 5/2 j 

Li 1 J!) 

4’ 2’ 4 J 

1 2 3 4 5 

{55’ {55’ {55’ i/55’ (55 1 


20. Find a unit vector that is oppositely directed to the given vector. 

(a) (-12, -5) 

(b) (3, -3,-3) 

(c) ( - 6, 8) 

(d) (-3, l,/6,3) 

21. State a procedure for finding a vector of a specified length m that points in the same direction as a given 
vector v. 

22. If || v|| = 2 and ||w|| = 3, what are the largest and smallest values possible for ||v — w||? Give a geometric 
explanation of your results. 

23. Find the cosine of the angle 0 between u and v. 

(a) u = (2, 3), v= (5, -7) 

(b) u = (-6, -2), v= (4, 0) 

(c) u=(l, -5,4), v= (3, 3, 3) 




(d) u= (-2, 2, 3), v= (1,7, —4) 


Answer: 

(a) cos 9 =- JJ— 

/ 962 

(b) cos 9 = -p=r 

/10 

( c ) cos 9 = 0 

(d) cos 9 = 0 

24. Find the radian measure of the angle 0 (with 0 < 9 < jt) between u and v. 

(a) (1, -7) and (21, 3) 

(b) (0, 2) and (3, - 3) 

(c) (-1, 1,0) and (0, -1,1) 

(d) (1, -1,0) and (1,0,0) 

In Exercises 25-26, verify that the Cauchy-Schwarz inequality holds. 

25 -(a) u= (3, 2), v = (4, -1) 

(b) u=(-3,l,0), v=(2, -1,3) 

(c) u= (0,2, 2,1), v= (1,1,1,1) 

Answer: 

(a) |u-v| = 10, INI ||v|| = /T3/17« 14.866 

(b) |u • v| = 7, ||u||||v|| = 11.832 

(c) | u ' v | = 5, N|||v|| = (3)(2) = 6 

26 *(a) u=(4,l,l), v= (1, 2, 3) 

(b) u= (1,2,1,2,3), v= (0,1,1,5, -2) 

(c) u= (1,3, 5, 2. 0,1), t= (0,2,4,1.3, 5) 

27. Let po = (jq, y Qt Z q) and p = (*, z ) • Describe the set of all points ( x , y, z ) for which ||p - poll = 1 • 
Answer: 

A sphere of radius 1 centered at (^ 0j j^q j zq)- 

2 ^ - (a) Show that the components of the vector v = (vj, V 2 ) in Figure Ex-28a are vj = ||v||cos 9 and 
V 2 = || v|| sin 9. 

(b) Let u and v be the vectors in Figure Ex-286. Use the result in part (a) to find the components of 
4u - 5v- 




(«) 



(b) 


Figure Ex-28 


29. Prove parts (a) and ( b ) of Theorem 3.2.1. 

30. Prove parts (a) and (c) of Theorem 3.2.3. 

31. Prove parts (d) and (e) of Theorem 3.2.3. 

32. Under what conditions will the triangle inequality (Theorem 3.2.5a) be an equality? Explain your answer 
geometrically. 

33. What can you say about two nonzero vectors, u and v, that satisfy the equation ||u 4 v|| = ||u|| 4 ||v||? 

34* (a) What relationship must hold for the point p = (a, b, c ) to be equidistant from the origin and the 

xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 
c. 

(b) What relationship must hold for the point p = (a, b, c) to be farther from the origin than from the 
xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 

c 


True-False Exercises 


In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) If each component of a vector in is doubled, the norm of that vector is doubled. 

Answer: 

True 

(b) In p}, the vectors of norm 5 whose initial points are at the origin have terminal points lying on a circle of 
radius 5 centered at the origin. 

Answer: 

True 

(c) Every vector in R n has a positive norm. 

Answer: 


False 






(d) If v is a nonzero vector in R n , there are exactly two unit vectors that are parallel to v. 

Answer: 

True 

(e) If ||u|| = 2, ||v|| = 1, and u • v = 1, then the angle between u and v is % j 3 radians. 

Answer: 

True 

(f) The expressions (u ■ v) 4 w and u • (v 4 w) are both meaningful and equal to each other. 
Answer: 

False 

(g) If u • v = u • w- then v = w. 

Answer: 

False 

(h) If u • v = 0> then either u = 0 or v = 0- 
Answer: 

False 

(i) In pi, if u lies in the first quadrant and v lies in the third quadrant, then u • v cannot be positive. 
Answer: 

True 

(j) For all vectors u, v, and w in R n , we have 

||u + v+w|| < ||u|| + ||v|| + ||w|| 

Answer: 

True 
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3.3 Orthogonality 

In the last section we defined the notion of “angle” between vectors in R n . In this section we will focus on the notion of 
“perpendicularity.” Perpendicular vectors in R n play an important role in a wide variety of applications. 


Orthogonal Vectors 


Recall from Formula 20 in the previous section that the angle 0 between two nonzero vectors u and v in R n is defned by the 
formula 

9=cos *( IMIIMI ) 

It follows from this that Q = r f 2 if and only if u ■ v = 0- Thus, we make the following definition. 


~i 


DEFINITION 1 

Two nonzero vectors u and v in R n are said to be orthogonal (or perpendicular) if u ■ v = 0* We will also agree that the 
zero vector in R n is orthogonal to every vector in R n . A nonempty set of vectors in R n is called an orthogonal set if all 
pairs of distinct vectors in the set are orthogonal. An orthogonal set of unit vectors is called an orthonormal set. 

J 


EXAMPLE 1 Orthogonal Vectors 

(a) Show that u = ( — 2, 3, 1,4) and v = (1, 2, 0, — 1) are orthogonal vectors in 

(b) Show that the set S= {i, j, k) of standard unit vectors is an orthogonal set in 

Solution 

The vectors are orthogonal since 

u • v= (- 2)0) + (3)(2) + (1)(0) + (4)( - 1) = 0 

(b) We must show that all pairs of distinct vectors are orthogonal, that is, 

ij = ik = j-k = 0 

This is evident geometrically (Figure 3.2.2), but it can be seen as well from the computations 

i • j= (1, 0, 0) • (0, 1, 0) = 0 
i • k= (1, 0, 0) • (0, 0, 1) = 0 
j • k = (0, 1,0) • (0, 0, 1) = 0 


In Example 1 there is no need to check that 

j-i = k- i = k- j=0 

since this follows from computations in the example and 
the symmetry property of the dot product. 


Lines and Planes Determined by Points and Normals 



One learns in analytic geometry that a line in R 2 is determined uniquely by its slope and one of its points, and that a plane in is 
determined uniquely by its “inclination” and one of its points. One way of specifying slope and inclination is to use a nonzero 
vector n, called a normal , that is orthogonal to the line or plane in question. For example, Figure 3.3.1 shows the line through the 
point PqCxq, jyo) ^ as norma l n = i a > 4) and the plane through the point Pq(* 0 > y$, zq) that has normal n = (a, b, c ) . Both 
the line and the plane are represented by the vector equation 

n-iV*=0 (1) 

where P is either an arbitrary point y) on the line or an arbitrary point y ? z) i n the plane. The vector f\jp can be expressed 
in terms of components as 

P^p = (x-x 0 , y-yo) [line] 

Po? = (x -xq, y-yo, z-zq) [plane] 

a(x-xo) •H-iO-j'o) = 0 [line] (2) 

a(x -xq)+ b(y-yo)+ c(z-zq) = 0 [plane] ( 3 ) 

These are called the point-normal equations of the line and plane. 

EXAMPLE 2 Point-Normal Equations 

It follows from 2 that in p} the equation 

6(x-3) + O + 7) = 0 

represents the line through the point (3, — 7) with normal n = (6, 1); and it follows from 3 that in p? the equation 

4(x — 3) + 2y — 5(z —7) = 0 

represents the plane through the point (3, 0,7) with normal n = (4, 2, — 5). 



(a, b , c) 


Z) 


n 


A 


P o(-*0* >0* -o> 


When convenient, the terms in Equations 2 and 3 can be multiplied out and the constants combined. This leads to the following 
theorem. 


THEOREM 3.3.1 


(a) If a and b are constants that are not both zero, then an equation of the form 






ax -\-by + c = 0 


( 4 ) 


represents a line in g} with normal n = (a, b ). 

(b) If a , b , and c are constants that are not all zero, then an equation of the form 

ax-\-by + + = 0 (5) 

represents a plane in with normal n = (<s, i>, c). 


EXAMPLE 3 Vectors Orthogonal to Lines and Planes Through the Origin 

The equation ax~\-by = 0 represents a line through the origin in p}. Show that the vector ni = (a, b) formed 
from the coefficients of the equation is orthogonal to the line, that is, orthogonal to every vector along the line. 
The equation ax~\-by -hcz= 0 represents a plane through the origin in Show that the vector 112 = ( a , b, c) 
formed from the coefficients of the equation is orthogonal to the plane, that is, orthogonal to every vector that 
lies in the plane. 


We will solve both problems together. The two equations can be written as 
(a, b) • ( x , y) = 0 and (a, b, c ) • ( x , y, z) = 0 


or, alternatively, as 


ni • (x,.y) = 0 and U 2 - (x,y,z) = 0 

These equations show that nj is orthogonal to every vector (x, 7 ) on the line and that 112 is orthogonal to every 
vector (x,y, z) i n the plane (Figure 3.3.1). 


Recall that 


ax “h by = 0 and ax 4- by + cz = 0 

are called homogeneous equations. Example 3 illustrates that homogeneous equations in two or three unknowns can be written in 
the vector form 


n • x = 0 (6) 

where n is the vector of coefficients and x is the vector of unknowns. In g? this is called the vector form of a line through the 
origin, and in R 3 it is called the vector form of a plane through the origin. 

Referring to Table 1 of Section 3.2, in what other ways 
can you write 6 if n and x are expressed in matrix form? 


Orthogonal Projections 

In many applications it is necessary to “decompose” a vector u into a sum of two terms, one term being a scalar multiple of a 
specified nonzero vector a and the other term being orthogonal to a. For example, if u and a are vectors in g? that are positioned 
so their initial points coincide at a point Q , then we can create such a decomposition as follows (Figure 3.3.2): 

Drop a perpendicular from the tip of u to the line through a. 

Construct the vector wq from Q to the foot of the perpendicular. 


Construct the vector W 2 = u — wj. 


(? w, 


Q » 


Q a 


(«) (A) (r) (<0 

In parts (b) through (d), u = 4 W2, where is parallel to a and W2 is orthogonal to a. 


Since 


wi 4- W2 = wi + (u — wi) = u 

we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar multiple of a and the second being 
orthogonal to a. 


The following theorem shows that the foregoing results, which we illustrated using vectors in g}, apply as well in g n . 


Projection Theorem 

If u and a are vectors in R } \ and if a * 0 , then u can be expressed in exactly one way in the form u = wj 4 - W2, where 
is a scalar multiple of a and W2 is orthogonal to a. 


Since the vector w\ is to be a scalar multiple of a, it must have the form 

«T = (7) 

Our goal is to find a value of the scalar k and a vector W 2 that is orthogonal to a such that 

u = w!+w 2 (8) 

We can determine k by using 7 to rewrite 8 as 

u = wi +W 2 = ia + W 2 

and then applying Theorems 3.2.2 and 3.2.3 to obtain 

u ■ a = (jfca -I- W 2 ) • a = £||a|| 2 4- (w 2 * a) (9) 

Since W2 is to be orthogonal to a, the last term in 9 must be 0, and hence k must satisfy the equation 

u ■ a = £||a|| 2 

from which we obtain 

INI 2 

as the only possible value for k. The proof can be completed by rewriting 8 as 

W 2 = u — wi = u — £a = u — ^ a 

Ml 2 

and then confirming that W 2 is orthogonal to a by showing that v ?2 * a = 0 (we leave the details for you). 

The vectors w\ and W 2 in the Projection Theorem have associated names—the vector is called the orthogonal projection of u 

on a or sometimes the vector component of u along a, and the vector W 2 is called the vector component of u orthogonal to a. The 
vector w\ is commonly denoted by the symbol proj a u, in which case it follows from 8 that W2 = u — proj a u. In summary, 










proj a u = u ^ a {vector component of u along a) 


( 10 ) 


u — proj a u = u ———^-a {vector component of u orthogonal to a) 

INI 2 

EXAMPLE 4 Orthogonal Projection on a Line 

Find the orthogonal projections of the vectors e\ = (1, 0) and e 2 = (0, 1) on the line L that makes an angle 0 with 
the positive x-axis in 

As illustrated in Figure 3.3.3, a = (cos 0, sin 0) is a unit vector along the line Z, so our first problem is 
to find the orthogonal projection of ej along a. Since 

||a|| = S in 2 0 + cos 2 0 = 1 and ei • a= ( 1 , 0 ) • (cos 0 , sin 0 ) = cos 0 
it follows from Formula 10 that this projection is 

proj a ei = e l ^ a = (cos 0 ) (cos 0 , sin 0 ) = [cos 2 0 , sin 0 cos 0 j 

Nl 2 v 1 

Similarly, since e 2 • a = (0, 1) • (cos 0, sin 0) = sin 0, it follows from Formula 10 that 

P ro Ja e 2 = ? a = (sin 0) (cos 0, sin 0) = (sin 0, cos 0sin 2 0 j 

ii a ii 2 1 1 


EXAMPLE 5 Vector Component of u Along a 


Let u = (2, —1,3) and a = (4, —1,2). Find the vector component of u along a and the vector component of u 
orthogonal to a. 


Solution 

ua = (2)(4) + (— 1)( — 1) + (3) (2) = 15 
||a || 2 =4 2 + (-l) 2 + 2 2 = 21 


Thus the vector component of u along a is 


proj a u = 


u a 



a = 




5 m 

T 7 ) 


and the vector component of u orthogonal to a is 
u — proj a u = (2, - 1, 3) 


(20 

5 

1 ]_ | 

f-£ 

2 

11 ^ 

l 7 ’ 

7’ 

7 ) 

i 7’ 

7’ 

7 J 


As a check, you may wish to verify that the vectors u — proj a u and a are perpendicular by showing that their dot 
product is zero. 


ay 

e 2 = (0, I) 


’ ~|(sin $, cos $) 
* sin 0 

x 


1 


V 


cos 6 


e. =(1. 0) 


(ii) 


Figure 3.3.3 











Sometimes we will be more interested in the norm of the vector component of u along a than in the vector component itself. A 
formula for this norm can be derived as follows: 


IIP ro Ja u ll = I 


-a = 


u • a 


i u ~ a i 


where the second equality follows from part (c) of Theorem 3.2.1 and the third from the fact that ||a|| 2 > 0. Thus, 

llproj » u|l=J l^ 


If 9 denotes the angle between u and a, then u • a = ||u|| ||a|| cos 9, so 12 can also be written as 

IIProJaull = NI|cos 9\ 

(Verify.) A geometric interpretation of this result is given in Figure 3.3.4. 



||u|| cos d 


(a) O<0< ? 



- ||u|| cos 0 

(b) £<0 <it 


Figure 3.3.4 


( 12 ) 


(13) 


The Theorem of Pythagoras 

In Section 3.2 we found that many theorems about vectors in p? and p/ also hold in R n . Another example of this is the following 
generalization of the Theorem of Pythagoras (Figure 3.3.5). 

Theorem of Pythagoras in R n 

If u and v are orthogonal vectors in R n with the Euclidean inner product, then 

||u + v|| 2 =||u|| 2 + ||v|| 2 (14) 


Since u and v are orthogonal, we have u • v = 0? from which it follows that 












I|u + v|| 2 = (n + v) • (u + v) = ||u|| 2 +• 2(u • v) 4- ||u|| 2 + ||v|| 2 

EXAMPLE 6 Theorem of Pythagoras in R 4 

We showed in Example 1 that the vectors 

u= ( — 2, 3, 1,4) and v= (1,2, 0,-1) 
are orthogonal. Verify the Theorem of Pythagoras for these vectors. 

We leave it for you to confirm that 

u + v=(-l,5, 1,3) 

||u + v|| 2 = 36 
l|u|| 2 + ||v|| 2 = 30 + 6 

Thus, ||u + v|| 2 =||u|| 2 +||v|| 2 


r 


Figure 3.3.5 


OPTIONAL 

Distance Problems 

We will now show how orthogonal projections can be used to solve the following three distance problems: 

Problem 1. Find the distance between a point and a line in g}. 

Problem 2. Find the distance between a point and a plane in $}. 

Problem 3. Find the distance between two parallel planes in g?. 

A method for solving the first two problems is provided by the next theorem. Since the proofs of the two parts are similar, we will 
prove part ( b ) and leave part ( a ) as an exercise. 


THEOREM 3.3.4 

(a) In p} the distance D between the point Pg(* 0 > ^yg) anc ^ ^ ne ax + by + c = 0 is 

^ |aso + l>y 0 +c| 

i U7? 

(b) In p} the distance D between the point Pq (xq, jq, zq) and the plane ax + by 4- cz + d = 0 is 

D laxo + ftyo+czo+^l 
/a 2 + & 2 +c 2 


(15) 


( 16 ) 






Proof (b) Let Q(x\, y\, z\) be any point in the plane. Position the normal n = (a, b, c ) so that its initial point is at Q. As 
illustrated in Figure 3.3.6, the distance D is equal to the length of the orthogonal projection of QP g on n. Thus, it follows from 
Formula 12 that 


QP o’ n 


But 


P>= llpr°J M 0P O || = 

QPo= Oo-*1. ^0-71. 

QP 0 -n = a(x 0 -xi) + b(y 0 -yi) -¥c(zq-z\) 


l|n|| = fa 


+ b 2 +c 2 

|g(x 0 -xi) +&Op-yi) +c(zq-zi)| 

lJfb 2 +, 2 


Thus 


D-- 


Since the point Q(x\, y\, z\) lies in the given plane, its coordinates satisfy the equation of that plane; thus 

ax i + by i “P cz\ + d = 0 
or 

d = —ax i — by ^ — cz\ 

Substituting this expression in 17 yields 16. 


EXAMPLE 7 Distance Between a Point and a Plane 


(17) 


Find the distance D between the point ( 1, — 4, — 3) and the plane 2x — 3y 4- 6z = — 1 • 

Since the distance formulas in Theorem 3.3.4 require that the equations of the line and plane be written 
with zero on the right side, we first need to rewrite the equation of the plane as 

2x — 3y + 6z + 1 = 0 

from which we obtain 

„ |2(1) + (— 3)( — 4) + 6( — 3) + 1| | — 31 _ 3 

^2 2 + (- 3) 2 + 6 2 7 7 


p>0’ ^ 



Figure 3.3.6 


The third distance problem posed above is to find the distance between two parallel planes in As suggested in Figure 3.3.7, the 
















distance between a plane V and a plane W can be obtained by finding any point Pq in one of the planes, and computing the 
distance between that point and the other plane. Here is an example. 



Figure 3.3.7 


The distance between the parallel planes V and W is equal to the distance between Pq and W. 


EXAMPLE 8 Distance Between Parallel Planes 


The planes 


x + 2y — 2z = 3 and 2x 4- Ay — 4z = 7 


are parallel since their normals, (1,2, — 2) and (2, 4, — 4), are parallel vectors. Find the distance between these 
planes. 


To find the distance D between the planes, we can select an arbitrary point in one of the planes and 
compute its distance to the other plane. By setting y = 0 in the equation x + 2y — 2z = 3, we obtain the point 
Pq( 3, 0, 0) in this plane. From 16, the distance between Pq and the plane 2x 4- Ay — Az = 7 is 


„ |2(3)+4(0) + (- 4)(0) — 7| \ 

^2 2 + 4 2 +( —4) 2 6 


Concept Review 

Orthogonal (perpendicular) vectors 

Orthogonal set of vectors 

Normal to a line 

Normal to a plane 

Point-normal equations 

Vector form of a line 

Vector form of a plane 

Orthogonal projection of u on a 

Vector component of u along a 

Vector component of u orthogonal to a 

Theorem of Pythagoras 

Skills 

Determine whether two vectors are orthogonal. 

Determine whether a given set of vectors forms an orthogonal set. 

Find equations for lines (or planes) by using a normal vector and a point on the line (or plane). 
Find the vector form of a line or plane through the origin. 

Compute the vector component of u along a and orthogonal to a. 









Find the distance between a point and a line in g} or g?. 
Find the distance between two parallel planes in 
Find the distance between a point and a plane. 


Exercise Set 3.3 

In Exercises 1-2, determine whether u and v are orthogonal vectors. 

1. (a) u= (6, 1,4), v= (2, 0, -3) 

(b ) u = (0, 0, — 1), v= (1,1,1) 

(c) u=(-6,0,4), v= (3, 1, 6) 

(d) u = (2, 4, -8), v= (5, 3,7) 

Answer: 

(a) Orthogonal 

(b) Not orthogonal 

(c) Not orthogonal 

(d) Not orthogonal 

2. (a) u = (2, 3), v= (5, -7) 

(b) u = (-6, -2), v= (4, 0) 

(C ) u= (1, -5,4), v= (3, 3, 3) 

(d) u = (— 2, 2, 3), v= (1,7, -4) 

In Exercises 3-4, determine whether the vectors form an orthogonal set. 

3 * ( a ) vi = (2, 3), v 2 = (3, 2) 

(b) v 1 = (-l,l),v 2 =(l,l) 

(c) V! = ( - 2, 1, 1), v 2 = (1, 0, 2), v 3 = ( — 2, — 5, 1) 

(d) vi = (-3,4, — 1), v 2 = (1, 2, 5), v 3 = (4, -3,0) 

Answer: 

(a) Not an orthogonal set 

(b) Orthogonal set 

(c) Orthogonal set 

(d) Not an orthogonal set 

4 -(a) vi = (2, 3), v 2 = (— 3, 2) 

(b) v 1 = (l, — 2), v 2 = (— 2, 1) 

(c) vi = (1, 0, 1), v 2 = (1, 1, 1), v 3 = (- 1, 0, 1) 

(d) V1 = (2, - 2, 1), v 2 = (2, 1, - 2), v 3 = (1, 2, 2) 

5. Find a unit vector that is orthogonal to both u = (1, 0, 1) and v = (0, 1, 


Answer: 


± (/ 5 ’ k '^) 

(a) Show that v = (a, b) and w= ( — b, a) are orthogonal vectors. 

(b) Use the result in part (a) to find two vectors that are orthogonal tov=(2, —3). 

(c) Find two unit vectors that are orthogonal to ( — 3,4). 

7. Do the points .4(1, 1, 1), 5( — 2, 0, 3), and C( — 3, —1,1) form the vertices of a right triangle? Explain your answer. 
Answer: 

Yes 

8 . Repeat Exercise 7 for the points A( 3, 0, 2), 5(4, 3, 0), and C( 8 , 1, — 1). 

In Exercises 9-12, find a point-normal form of the equation of the plane passing through P and having n as a normal. 

9 . P(- 1,3, -2), n = ( — 2, 1, -1) 

Answer: 

—2{x + 1) 4 - (y — 3) — (z + 2) = 0 

10 . P(1,1,4); n = (1, 9, 8 ) 

H.P(2, 0, 0); n= (0, 0, 2) 

Answer: 

2 z = 0 

12 .P( 0 , 0 , 0 ); n= (1, 2, 3) 

In Exercises 13-16, determine whether the given planes are parallel. 

13.4x — y + 2z = 5 and 7 * - 3y + Az = 8 
Answer: 

Not parallel 

14. x - Ay - 3z - 2 = 0 and 3 X - 12y - 9z - 7 = 0 
15 - 2y = 8 x - Az + 5 and x = ^y 

Answer: 

Parallel 

16. (_4, 1,2) • (x, 7 ,z) = 0 and( 8 , -2, -4) - (x,y,z) = 0 

In Exercises 17-18, determine whether the given planes are perpendicular. 

17. 3* —7 4-z —4 = 0, x -E 2z = —1 
Answer: 

Not perpendicular 

18. x — 2y + 3z = 4, — 2x -P 5y 4 - 4z = — 1 
In Exercises 19-20, find ||proj a u||. 

19. (a) u= (1, -2), a= ( — 4, -3) 

(b ) u= (3, 0,4), a =(2,3, 3) 



Answer: 


(a) 2 

(b) _JL 

y 22 

20 . (a) u= (5, 6 ), a =(2, -1) 

(b ) u= (3, -2,6), a= (1, 2, -7) 

In Exercises 21-28, find the vector component of u along a and the vector component of u orthogonal to i 

21 . u =(6,2), a = (3, -9) 

Answer: 

(0, 0) (6, 2) 

22 . «=(-!, -2), a = (— 2, 3) 

23. u=(3,1, -7), a= (1, 0, 5) 

Answer: 

f-ii 0 (55 x m 

[ 13’ ’ 13 y U3’ 13 } 

24 . u= (1,0,0), a =(4,3, 8 ) 

25 . u = (1,1, l),a=(0, 2,-1) 


Answer: 

H-i} (’•!•!) 

26. u = (2, 0, 1), a= (1, 2, 3) 

27 . u= (2, 1, 1, 2), a = (4, -4,2, -2) 

Answer: 

(1 .1 1 _ M (2 §. _ 2 _ 2 l _\ 

\5’ 5’ 10’ 10J’ \5’ 5’ 10’ 10J 

28 . u=( 5,0, — 3,7), a= (2, 1, -1,-1) 

In Exercises 29-32, find the distance between the point and the line. 
29.4x + 3,y + 4 = 0; (-3, 1) 

Answer: 

1 

30. x -3^ + 2 = 0; (-1,4) 

31. y = — 4x + 2, (2, — 5) 

Answer: 

1 

{Vi 

32. 3x 4-y = 5; (1,8) 

In Exercises 33-36, find the distance between the point and the plane. 




33. (3, 1, — 2); x + 2y — 2z = 4 


Answer: 

5 

3 

34. (-1, — 1, 2); 2x + 5y — 6z = 4 

35. (-1,2, 1); 2x + 3y —4z= 1 

Answer: 


{29 

36. (0,3, “2); x — y —z =3 

In Exercises 37^10, find the distance between the given parallel planes. 


37. 2x — z = 5 and —Ax + 2y -I- 2z= 12 


Answer: 

11 


fe 


38. 3x — Ay +z = 1 and 6x — Sy A-2z = 3 

39. -Ax A-y - 3z = 0 and 8 x - 2y A- 6z = 0 

Answer: 

0 (The planes coincide.) 

40. 2x-y+z= 1 and 2 x-y+z= - 1 

41. Let i, j, and k be unit vectors along the positive x, y, and z axes of a rectangular coordinate system in 3-space. If v = (a, b, c) 
is a nonzero vector, then the angles a, P, and y between v and the vectors i, j, and k, respectively, are called the direction 
angles of v (Figure Ex-41), and the numbers cos a, cos li, and cos 7 are called the direction cosines of v. 

(a) Show that cos a = a I || v|| . 

(b) Find cos ft and cos 7 . 

(c) Show that v / ||v|| = (cos a, cos ft, cos 7 ) . 

(d) Show that cos 2 Ck + cos 2 ft A- cos 2 7 = 1. 


k 


v 





Figure Ex-41 


Answer: 



42. Use the result in Exercise 41 to estimate, to the nearest degree, the angles that a diagonal of a box with dimensions 




10 cm x 15 cm x 25 cm makes with the edges of the box. 

43. Show that if v is orthogonal to both and W 2 , then v is orthogonal to + & 2 W 2 f° r sca l ars and 

44. Let u and v be nonzero vectors in 2- or 3-space, and let k = ||u|| and / = ||v||. Show that the vector w= lu + kv bisects the 
angle between u and v. 

45. Prove part (a) of Theorem 3.3.4. 

46. Is it possible to have 

proj a u = proj a a ? 

Explain your reasoning. 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The vectors (3, — 1, 2) and (0, 0, 0) are orthogonal. 

Answer: 

True 

(b) If u and v are orthogonal vectors, then for all nonzero scalars k and m , kn and mv are orthogonal vectors. 

Answer: 

True 

(c) The orthogonal projection of u along a is perpendicular to the vector component of u orthogonal to a. 

Answer: 

True 

(d) If a and b are orthogonal vectors, then for every nonzero vector u, we have 

P ro Ja(P ro Jb ( u ) ) = 0 


Answer: 

True 

(e) If a and u are nonzero vectors, then 

proj a (proj a (u)) =proj a (u) 


Answer: 

True 

(f) If the relationship 

proj a u = proj a v 

holds for some nonzero vector a, then u = v- 
Answer: 

False 

(g) For all vectors u and v, it is true that 

l|u + v|| = HI + IMI 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



3.4 The Geometry of Linear Systems 

In this section we will use parametric and vector methods to study general systems of linear equations. This work will enable us to interpret 
solution sets of linear systems with n unknowns as geometric objects in R n just as we interpreted solution sets of linear systems with two 
and three unknowns as points, lines, and planes in r} and r}. 


Vector and Parametric Equations of Lines in R 2 and R 3 

In the last section we derived equations of lines and planes that are determined by a point and a normal vector. However, there are other 
useful ways of specifying lines and planes. For example, a unique line in R 2 or R -' is determined by a point xq on the line and a nonzero 
vector v parallel to the line, and a unique plane in R 3 is determined by a point xq in the plane and two noncollinear vectors vj and V2 
parallel to the plane. The best way to visualize this is to translate the vectors so their initial points are at xq (Figure 3.4.1). 



Figure 3.4.1 

Let us begin by deriving an equation for the line L that contains the point xq and is parallel to v. If x is a general point on such a line, then, 
as illustrated in Figure 3.4.2, the vector x — xq will be some scalar multiple of v, say 

x — xq = tv or equivalently x = xq + tv 

As the variable t (called a parameter) varies from — qg to do, the point x traces out the line L. Accordingly, we have the following result. 


THEOREM 3.4.1 

Let L be the line in R 2 or R that contains the point xq and is parallel to the nonzero vector V - Then the equation of the line through 
xq that is parallel to v is 

x = xq -F £v (1) 

If xq = 0, then the line passes through the origin and the equation has the form 

x = tv (2) 


Although it is not stated explicitly, it is understood in 
Formulas 1 and 2 that the parameter t varies from —do to oq. 
This applies to all vector and parametric equations in this text 
except where stated otherwise. 







Figure 3.4.2 


Vector and Parametric Equations of Planes in R 3 

Next we will derive an equation for the plane W that contains the point xq and is parallel to the noncollinear vectors v\ and V2. As shown in 
Figure 3.4.3, if x is any point in the plane, then by forming suitable scalar multiples of vj and V 2 , say and ^2 V 2> we can create a 
parallelogram with diagonal x — xq and adjacent sides and ^2 V 2- Thus, we have 

x — xq = ^ivi + ^2 v 2 or equivalently x = xq + 0 V 1 + *2 V 2 



Figure 3.4.3 

As the variables t\ and tj (called parameters ) vary independently from — qq to do, the point x varies over the entire plane W. Accordingly, 
we make the following definition. 


THEOREM 3.4.2 

Let IF be the plane in p} that contains the point xq and is parallel to the noncollinear vectors v\ and V2- Then an equation of the 
plane through xq that is parallel to v\ and V2 is given by 

x = xq=Mivi 4- t 2 v 2 (3) 


If xq = 0, then the plane passes through the origin and the equation has the form 

x = ^ivi=K 2 v 2 (4) 


Observe that the line through xq represented by Equation 1 is the translation by xq of the line through the origin represented by 
Equation 2 and that the plane through xq represented by Equation 3 is the translation by xq of the plane through the origin represented by 
Equation 4 (Figure 3.4.4). 







X = x 0 + t \ 


y 


*0 


X = t \ 

V * 

-► 



Figure 3.4.4 


Motivated by the forms of Formulas 1 to 4, we can extend the notions of line and plane to R n by making the following definitions. 


DEFINITION 1 

If XQ and v are vectors in R n , and if v is nonzero, then the equation 

x = xq 4- tv (5) 

defines the line through xq that is parallel to v . In the special case where xq = 0, the line is said to pass through the origin. 

L J 

r n 


DEFINITION 2 

If xq, vi , and V2 are vectors in R n , and if v\ and V2 are not collinear, then the equation 

x = xq=Kivi =K 2 v 2 (6) 

defines the plane through xq that is parallel tov\ and V2 • In the special case where xq = 0, the plane is said to pass through the 
origin. 


L J 

Equations 5 and 6 are called vector forms of a line and plane in R n . If the vectors in these equations are expressed in terms of their 
components and the corresponding components on each side are equated, then the resulting equations are called parametric equations of 
the line and plane. Here are some examples. 


EXAMPLE 1 Vector and Parametric Equations of Lines in R 2 and R 3 

Find a vector equation and parametric equations of the line in R 2 that passes through the origin and is parallel to the 
vector v = ( — 2, 3). 

Find a vector equation and parametric equations of the line in R-' that passes through the point Fq(1, 2, — 3) and is 
parallel to the vector v = (4, — 5, 1). 

Use the vector equation obtained in part (b) to find two points on the line that are different from Fq. 

Solution 

It follows from 5 with xq = 0 that a vector equation of the line is x = tv- If we let x = (x, y), then this equation can be 
expressed in vector form as 

(*.*)=<(-2, 3) 

Equating corresponding components on the two sides of this equation yields the parametric equations 







x = — 2t, y = 3t 


It follows from 5 that a vector equation of the line is x = xq + tv. If we let x = (x,y, z), and if we take 
xo = (l,2, — 3), then this equation can be expressed in vector form as 

(x,y, Z ) = (\,2, - 3 )+*( 4 , - 5 , 1 ) ( 7 ) 

Equating corresponding components on the two sides of this equation yields the parametric equations 

x = \ +At, y = 2 — 5t, z — — 3 

A point on the line represented by Equation 7 can be obtained by substituting a specific numerical value for the 
parameter t . However, since t = 0 produces ( Xj y f z) = ( 1,2, — 3), which is the point Pq, this value of t does not serve 
our purpose. Taking t — ] produces the point ( 5 , “ 3 , “2) and taking t — _ ] produces the point ( — 3 , 7 , “ 4 ). Any 
other distinct values for t (except t = Q) would work just as well. 


o 

EXAMPLE 2 Vector and Parametric Equations of a Plane in R 

Find vector and parametric equations of the plane x — y + 2z = 5- 

We will find the parametric equations first. We can do this by solving the equation for any one of the variables in 
terms of the other two and then using those two variables as parameters. For example, solving for x in terms of y and z yields 

x = 5+y-2z (8) 


and then using y and z as parameters t\ and ^ respectively, yields the parametric equations 

x = 5-Mi-2 *2, y = t i, z = t 2 


We would have obtained different parametric and 
vector equations in Example 2 had we solved 8 fory or 
z rather than x. However, one can show the same plane 
results in all three cases as the parameters vary from 
—oo to dq. 


To obtain a vector equation of the plane we rewrite these parametric equations as 

(x,y,z) = \ -2t 2 ,t{,t2) 


or, equivalently, as 


(x,y,z) = (5, 0, 0) +*i(l f 1, 0) + * 2 ( “ 2, 0, 1) 


EXAMPLE 3 Vector and Parametric Equations of Lines and Planes in Z? 4 

Find vector and parametric equations of the line through the origin of that is parallel to the vector v=(5, — 3,6,1). 
Find vector and parametric equations of the plane in that passes through the point xq = (2, — 1, 0, 3) and is parallel 
to both vi = (1, 5, 2, — 4) and V 2 = (0, 7, —8,6). 

Solution 

If we let x = (xj, *2, x 2> x 4)> then the vector equation x = tv can he expressed as 

Ol,*2> *3, *4) =*(5, -3,6,1) 

Equating corresponding components yields the parametric equations 

x\ = 51, *2 = — 3 1, *3 = 6t, X 4 = t 


(b) The vector equation x = xq 4- 4= £ 2 V 2 can expressed as 

(xi,x 2 ,x 3 ,x 4 ) = (2, — 1, 0, 3) +*i(l, 5, 2, -4)+< 2 (0.7, -8,6) 

which yields the parametric equations 

xi = 2 + *i 

x 2 = — 1 + 5t\ 4 s 7^ 2 

*3 = 2^i — 8^ 2 

x 4 = 3 — 4^i 4= 6^2 


Lines Through Two Points in R n 

If xq and xi are distinct points in R n , then the line determined by these points is parallel to the vector v = xj — xq (Figure 3.4.5), so it 
follows from 5 that the line can be expressed in vector form as 


x = xq +*(xi - xq) 


or, equivalently, as 


x= (1 -£)xci + *xi 

These are called the two-point vector equations of a line in R n . 

EXAMPLE 4 A Line Through Two Points in R 2 

Find vector and parametric equations for the line in R 2 that passes through the points P(0, 7) and Q(5, 0). 


(9) 


( 10 ) 


We will see below that it does not matter which point we take to be xq and which we take to be xj, so let us 
choose xq = (0, 7) and xi = (5, 0). It follows that xj — xq = (5, — 7) and hence that 


(x, 7 ) = (0,7)+*(5, -7) 


( 11 ) 


which we can rewrite in parametric form as 


x = 5t, y = 1 — It 

Had we reversed our choices and taken xq = (5, 0) and x\ = (0, 7), then the resulting vector equation would have been 


(x,y) = (5, 0)+*(-5.7) 


( 12 ) 


and the parametric equations would have been 


x = 5 — 5t, y = It 

(verify). Although 11 and 12 look different, they both represent the line whose equation in rectangular coordinates is 

7x + 5y = 35 

(Figure 3.4.6). This can be seen by eliminating the parameter t from the parametric equations (verify). 


*o 
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Figure 3.4.5 
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Figure 3.4.6 


The point x = (x, y) in Equations 9 and 10 traces an entire line in g 2 as the parameter t varies over the interval ( — oo, oo). If, however, 
we restrict the parameter to vary from t — Q to t = 1, then x will not trace the entire line but rather just the line segment joining the points 
xq and xi. The point x will start at xq when t — Q and end at xj when t =\. Accordingly, we make the following definition. 


DEFINITION 3 

If xq and xj are vectors in g n , then the equation 

x = x 0 + /(xi -x 0 ) (0</< 1) (13) 

defines the line segment from xq to x\ . When convenient, Equation 13 can be written as 

x= (1 -/)xq4-/xi (0</< 1) (14) 


L 


J 


EXAMPLE 5 A Line Segment from One Point to Another in R 2 

It follows from 13 and 14 that the line segment in g} from xo = (1, — 3) to xj = (5, 6) can be represented either by the 
equation 


or by 


x = (1, — 3) + /(4, 9) (0 </ < 1) 
x= (1 —0(1- “3) 4-/(5, 6) (0 <* < 1) 


Dot Product Form of a Linear System 

Our next objective is to show how to express linear equations and linear systems in dot product notation. This will lead us to some 
important results about orthogonality and linear systems. 

Recall that a linear equation in the variables x\, x 2 , x n has the form 

a\x\ +<*2*2 4 + = b (a\, ^ 2 , a n not all zero) (15) 

and that the corresponding homogeneous equation is 

31*1 4- <*2*2 + ...4 -ct n x n = 0 (a 1 , <22,ct n not all zero) 

These equations can be rewritten in vector form by letting 

a= (ai,<32>and x = (x\, X2> ---> *n) 


( 16 ) 




in which case Formula 15 can be written as 


a-x = & (17) 

and Formula 16 as 

a-x = 0 (18) 


Except for a notational change from n to a, Formula 18 is the extension to R n of Formula 6 in Section 3.3. This equation reveals that each 
solution vector x of a homogeneous equation is orthogonal to the coefficient vector a. To take this geometric observation a step further, 
consider the homogeneous system 


< 211*1 


< 212*2 

+ .. 

.. + 

<2 1m * m 

= 0 

< 221 * 1 

+ 

< 222*2 

+ - 

.. + 

<22 n x n 

= 0 

a m \x\ 

+ 

tfm 2*2 

4= .. 

.. + 

< 2 m «*« 

= 0 


If we denote the successive row vectors of the coefficient matrix by r\, r 2 , r m , then we can rewrite this system in dot product form as 


iq • x = 0 

r 2 * x = 0 

r m ■ x = 0 

from which we see that every solution vector x is orthogonal to every row vector of the coefficient matrix. In summary, we have the 
following result. 


(19) 


THEOREM 3.4.3 

If A is an m x n matrix, then the solution set of the homogeneous linear system Ax.= 0 consists of all vectors in R n that are 
orthogonal to every row vector of A. 


EXAMPLE 6 Orthogonality of Row Vectors and Solution Vectors 


We showed in Example 6 of Section 1.2 that the general solution of the homogeneous linear system 
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*6 




is 


x\ = — 3r — As — 2t, X2 = r, * 3 = —2s, x 4 = s, x$ = t, x$ = 0 


which we can rewrite in vector form as 

x = ( — 3r — As — 2t, r, — 2s, s, t, 0) 


According to Theorem 3.4.3, the vector x must be orthogonal to each of the row vectors 

ri = (1, 3, — 2, 0,2,0) 
r 2 = (2,6, -5, -2,4, -3) 
r 3 = (0, 0, 5, 10, 0, 15) 
r 4 = (2, 6 , 0, 8,4, 18) 








We will confirm that x is orthogonal to r \ , and leave it for you to verify that x is orthogonal to the other three row vectors as 
well. The dot product of r\ and x is 

ri ■ x= 1( - 3r-4s-2t) + 3 (r) + ( - 2)( - 2s) + 0 (s) + 2(0 + 0(0) = 0 

which establishes the orthogonality. 


The Relationship Between Ax = 0 and Ax = b 


We will conclude this section by exploring the relationship between the solutions of a homogeneous linear system = 0 and the solutions 
(if any) of a nonhomogeneous linear system Ax = b that has the same coefficient matrix. These are called corresponding linear systems. 


To motivate the result we are seeking, let us compare the solutions of the corresponding linear systems 
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We showed in Example 5 and Example 6 of Section 1.2 that the general solutions of these linear systems can be written in parametric form 
as 


homogeneous —► x \ = — 3r — As — 2t, X 2 = r, *3 — “2s, X 4 = s, x^ = t 7 x^ = 0 

nonhomogeneous —► x\ = — 3r — 4s — 2t, X 2 = r, *3 = —2s, X 4 = s, x$ = t, x$ = 

which we can then rewrite in vector form as 

homogeneous —► (*i, X 2 , * 3 , * 4 , x$) = ( — 3r — 4s — 2t, r t — 2s, s, t , 0) 

nonhomogeneous —► (^ 1 , * 2 , * 3 , *4, * 5 ) = | — 3r — 4s — 2t, r, — 2s, s, t r j 

By splitting the vectors on the right apart and collecting terms with like parameters, we can rewrite these equations as 


homogeneous — ► (x\, X 2 , *3, * 4 , *5) = r( — 3, 1, 0, 0, 0) H=s( — 4 , 0, — 2, 1 , 0, 0) +*( — 2, 0, 0, 0, 1, 0) (20) 


nonhomogeneous —► (x\, X 2 , * 3 , * 4 , * 5 ) = /"( — 3, 1, 0, 0, 0) -hs( — 4, 0, — 2, 1, 0, 0) H- t{ — 2, 0, 0, 0, 1, 0) + ^0, 0, 0, 0, 0, -ij (21) 

Formulas 20 and 21 reveal that each solution of the nonhomogeneous system can be obtained by adding the fixed vector 10, 0, 0, 0, 0, j 
to the corresponding solution of the homogeneous system. This is a special case of the following general result. 


THEOREM 3.4.4 

The general solution of a consistent linear system = b can be obtained by adding any specific solution of = b to the general 
solution of Ax. = 0- 


Let xq be any specific solution of Ax = b> let W denote the solution set of Ax = 0? and let xq + W denote the set of all vectors that 
result by adding xq to each vector in W. We must show that if x is a vector in xq 4- W, then x is a solution of Ax = b> an d conversely, that 
every solution of ^x = h is in the set xq + W. 

Assume first that x is a vector in xq 4= W . This implies that x is expressible in the form x = X0 4 - w, where Axq = b and / hv = 0 . Thus, 

Ax = yl(xo + w) = Axfj 4 = Aw = b + 0 = b 


which shows that x is a solution of Ax = b . 














Conversely, let x be any solution of Ax = b To show that x is in the set xq -f W we must show that x is expressible in the form 

X = XQ+W (22) 

where w is in W (i.e., Aw = 0)- We can do this by taking w = x — xq . This vector obviously satisfies 22, and it is in W since 

Aw = ^4(x — xq) =Ax — Ax. q = b — b = 0 



The solution set of Ax. = b is a translation of the solution space of Ax = 0- 


Theorem 3.4.4 has a useful geometric interpretation that is illustrated in Figure 3.4.7. If, as discussed in Section 3.1, we interpret 
vector addition as translation, then the theorem states that if xq is any specific solution of Ax. — b> then the entire solution set of Ax = b can 
be obtained by translating the solution set of ^Jx = Q by the vector xq . 


Concept Review 

Parameters 

Parametric equations of lines 
Parametric equations of planes 
Two-point vector equations of a line 
Vector equation of a line 
Vector equation of a plane 

Skills 

Express the equations of lines in g} and g} using either vector or parametric equations. 

Express the equations of planes in g n using either vector or parametric equations. 

Express the equation of a line containing two given points in g 2 or g 3 using either vector or parametric equations. 

Find equations of a line and a line segment. 

Verify the orthogonality of the row vectors of a linear system of equations and a solution vector. 

Use a specific solution to the nonhomogeneous linear system Ax — b and the general solution of the corresponding linear system 
Ax = 0 to obtain the general solution to Ax. = b- 


Exercise Set 3.4 

In Exercises 1-4, find vector and parametric equations of the line containing the point and parallel to the vector. 
1. Point: ( — 4, 1); vector: v = (0, — 8) 

Answer: 

Vector equation: (*, ^) = ( - 4 , 1 ) + *(0, - 8); 


parametric equations: x = —4, ^ = 1— 8^ 


2. Point: (2, =1); vector: v= ( — 4, — 2) 

3. Point: (0, 0, 0); vector: v= ( — 3, 0, 1) 

Answer: 

Vector equation: (x,y,z) =t( — 3, 0, 1); 

parametric equations: x = —3 1, y = 0, z = t 

4. Point: ( — 9, 3, 4); vector: v= ( — 1, 6, 0) 

In Exercises 5-8, use the given equation of a line to find a point on the line and a vector parallel to the line. 

5 . x= (3 — 5t, — 6 — t) 

Answer: 

Point: (3, — 6); parallel vector: ( — 5, — 1) 

6. (x, y , z) = (4 1, 7, 4 4- 3 1) 

7 . x=(1-0(4, 6)+<(-2, 0) 

Answer: 

Point: (4, 6 ); parallel vector: (“6, — 6) 

8 . *= ( 1 - 0 ( 0 , - 5 , 1 ) 

In Exercises 9-12, find vector and parametric equations of the plane containing the given point and parallel vectors. 

9. Point: ( — 3, 1,0); vectors: vi = (0, — 3, 6) and V 2 = ( — 5, 1,2) 

Answer: 

Vector equation: (x,y,z) = (-3, 1, 0) +*i(0, -3, 6) + * 2 (-5, 1, 2); 

parametric equations: x = - 3 - 5 t 2 , ,y = 1 - 3 *i + f 2 , z = & 1 + 

10. Point: (0, 6, — 2); vectors: v\ = (0, 9, — 1) and v 2 = (0, — 3, 0) 

11. Point: ( — 1, 1,4); vectors: vi = (6, — 1, 0) and v 2 = ( — 1,3,1) 

Answer: 

Vector equation: (x,y,z) = (- 1, 1,4) + *i(6, - 1, 0) +* 2 (- 1, 3, 1), 

parametric equations: x = — 1 -h 6 ^i — ^ 2 , 7 = 1 - 1\ + 3 * 2 , z = 4 4 - *2 

12. Point: (0, 5, —4); vectors: vi = (0, 0, — 5) and v 2 = (1, — 3, — 2) 

In Exercises 13-14, find vector and parametric equations of the line in r} that passes through the origin and is orthogonal to v. 

13. *= (-2, 3) 

Answer: 

A possible answer is vector equation: (*, y ) = t(3, 2)1 

parametric equations: x = 31, y = 2i 

14. v = (1, -4) 

In Exercises 15-16, find vector and parametric equations of the plane in R* that passes through the origin and is orthogonal to ^ 

15. v = (4, 0, — 5) [Hint: Construct two nonparallel vectors orthogonal to v in R?]. 


Answer: 



A possible answer is vector equation: (x,y,z) = t \ (0, 1, 0) 4 ~ ^ (5, 0,4); 

parametric equations: x A* 5 t 2 , y = £\, z = 4 t 2 

16. v=(3, 1, - 6 ) 

In Exercises 17-20, find the general solution to the linear system and confirm that the row vectors of the coefficient matrix are orthogonal 
to the solution vectors. 

17. x\ 4- X 2 + *3 = 0 
2*i A- 2x2 + 2*3 = 0 
3xi + 3*2 + 3x3 = 0 

Answer: 

xi = — s — X 2 = s, X 3 = t 

18. xj +3x 2 -4x 3 = 0 
2xi 4= 6 x 2 — 8 x 3 = 0 

19. xi =H 5x2 + X 3 4- 2 x 4 “ *5 = 0 

x 1 — 2 x 2 — *3 + 3x4 + 2 x 5 = 0 

Answer: 

*1 x 2 = -jrA-ljS+jt, X 3 = r, x 4 = s, x 5 = t 

20. *1 4-3x2-4x3 = 0 

xi 4- 2x2 + 3x3 = 0 

(a) The equation x +y +z = 1 can be viewed as a linear system of one equation in three unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 

(b) Give a geometric interpretation of the result in part (a). 

Answer: 

(a) (l,0,0)+s(-l,l,0)+*(- 1,0,1) 

(b) a plane in passing through P(l, 0, 0) and parallel to ( — 1, 1,0) and ( — 1,0, 1) 

22* (a) The equation x ^y = 1 can be viewed as a linear system of one equation in two unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 

(b) Give a geometric interpretation of the result in part (a). 

(a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in that are 
orthogonal toa=(l, 1, 1) and b = ( — 2, 3, 0). 

(b) What kind of geometric object is the solution space? 

(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 

Answer: 

(a) * + y + z = 0 

—2x + 3 y =0 

(b) a line through the origin in 

( c ) x = - jt, y = - yf, z = t 

24- (a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in ??-' that are 
orthogonal to a =( — 3, 2, — 1) and b = (0, — 2, — 2). 

(b) What kind of geometric object is the solution space? 



(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 


25. Consider the linear systems 
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(a) Find a general solution of the homogeneous system. 

(b) Confirm that x\ = 1, *2 = 0, *3 = 1 is a solution of the nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomogeneous system directly. 

Answer: 

a - *i = -j s + j*, *2 =s, *3 =t 
c - *1 = 1 —-jS+yC *2 =s, * 3=1 +t 
26. Consider the linear systems 
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(a) Find a general solution of the homogeneous system. 

(b) Confirm that x\ = 1, *2 = 1, *3 = 1 is a solution of the nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomogeneous system directly. 

In Exercises 27-28, find a general solution of the system, and use that solution to find a general solution of the associated homogeneous 
system and a particular solution of the given system. 
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Answer: 

x\ = -j — js — -j t, *2 =s > x 3 = £> * 4 = 1 ; The general solution of the associated homogeneous system is 
* 1 = — — jt, *2 =s, *3 = t, *4 = 0. A particular solution of the given system is * \ = -i, *2 = 0, *3 = 0, *4 = 1 . 
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True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 






































(a) The vector equation of a line can be determined from any point lying on the line and a nonzero vector parallel to the line. 
Answer: 

True 

(b) The vector equation of a plane can be determined from any point lying in the plane and a nonzero vector parallel to the plane. 
Answer: 

False 

(c) The points lying on a line through the origin in or are all scalar multiples of any nonzero vector on the line. 

Answer: 

True 

(d) All solution vectors of the linear system Ax: = b are orthogonal to the row vectors of the matrix A if and only if b = Q. 
Answer: 

True 

(e) The general solution of the nonhomogeneous linear system Ax: = b can be obtained by adding b to the general solution of the 
homogeneous linear system Ax = 0- 

Answer: 

False 

(f) If xi and X2 are two solutions of the nonhomogeneous linear system Ax = b> then x\ — X2 is a solution of the corresponding 
homogeneous linear system. 

Answer: 

True 
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3.5 Cross Product 

This optional section is concerned with properties of vectors in 3-space that are important to physicists and 
engineers. It can be omitted, if desired, since subsequent sections do not depend on its content. Among other 
things, we define an operation that provides a way of constructing a vector in 3-space that is perpendicular to two 
given vectors, and we give a geometric interpretation of 3 x 3 determinants. 


Cross Product of Vectors 

In Section 3.2 we defined the dot product of two vectors u and v in w-space. That operation produced a scalar as its 
result. We will now define a type of vector multiplication that produces a vector as the result but which is 
applicable only to vectors in 3-space. 


DEFINITION 1 


If u = (it i, U 2 , u 3 ) and v = (v \ , v 3 , V3) are vectors in 3-space, then the cross product u x v is the vector 
defined by 

uxv = («2V3-U3V2, «3V1 -W1V3, u\v 2 -u 2 v\) 


or, in determinant notation, 


(u 2 u 3 
yv 2 v 3 


u 3 
VI v 3 


«1 u 2 \ 
VI V 2 j 


( 1 ) 


J 


Instead of memorizing 1, you can obtain the components of u x v as follows: 


Form the 2x3 matrix 


«1 

vi 


v 2 


v 3 


whose first row contains the components of u and whose second row 


contains the components of v. 


To find the first component of u x v> delete the first column and take the determinant; to find the second 
component, delete the second column and take the negative of the determinant; and to find the third component, 
delete the third column and take the determinant. 


EXAMPLE 1 Calculating a Cross Product 

Findux v> where u= (1, 2, — 2) and v= ( 3 , 0, 1). 

From either 1 or the mnemonic in the preceding remark, we have 


UX V = 


2 -2 

0 1 


1 -2 
3 1 


1 2 
3 0 


( 2 , - 7 , -6) 
















The following theorem gives some important relationships between the dot product and cross product and also 
shows that u x v is orthogonal to both u and v. 


The cross product notation AxB was introduced by the American physicist and 
mathematician J. Willard Gibbs, (see p. 134) in a series of unpublished lecture notes for his students at Yale 
University. It appeared in a published work for the first time in the second edition of the book Vector 
Analysis , (Edwin Wilson) by Edwin Wilson (1879—1964), a student of Gibbs. Gibbs originally referred to 
A x B as the “skew product.” 


Relationships Involving Cross Product and Dot Product 


If u, v, and w are vectors in 3-space, then 


(a) 

u • (u x v) = 0 

(u x v is orthogonal to u) 

(*) 

u*(uxv) = 0 

(uxvis orthogonal to v) 

(0 

||uxv|| 2 = ||u|| 2 ||v|| 2 - (u-v) 2 

(Lagrange ' s identity) 

(d) 

U X (v x w) = (u • w) V — (u • v)w 

(relationship between cross and dot products) 

0) 

(u X v) X w = (u • w)v — (v • w)u 

(relationship between cross and dot products) 


Letu= (u\, U2 , 2^3) and v= (y\, V2, V3). Then 

U’(nxv) = (ui,«2,«3) ’ (^2^3 ~ “3^2-“3^1 - 2 ^ 3 , u\V 2 - 2 ^ 1 ) 

= ti\(ii2V2 -W 3 V 2 ) + «2(«3Vl “ w l v 3) +U2(u\V2-U2V\) = 0 

Proof (b) Similar to (a). 

Since 


and 


||uxv|| 2 = («2V3-«3V2) 2 + («3V1 -«iV3) 2 + ( u\V2~U2V \) 2 
||u|| 2 ||v|| 2 - (u • v) 2 = (uj + u 2 4 - U3 J(v 2 + v 2 + v 2 j - (uivi + U2V2 + ^i ) 2 


( 2 ) 

( 3 ) 


the proof can be completed by “multiplying out” the right sides of 2 and 3 and verifying their equality. 
See Exercises 38 and 39. 


EXAMPLE 2 u x vis Perpendicular to u and to v 


◄ 


Consider the vectors 

u= (1,2, -2) and v=(3,0, 1) 


In Example 1 we showed that 

uxv = ( 2 , - 7 , -6) 

Since 

u • (ux v) = (1)(2) + (2)( — 7) + (— 2)( — 6) = 0 

and 

V • (u X v) = (3) (2) + (0) (- 7) + (1) (- 6) = 0 
u x v is orthogonal to both u and v, as guaranteed by Theorem 3.5.1. 
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Joseph Louis Lagrange (1736-1813) 

Joseph Louis Lagrange was a French-Italian mathematician and astronomer. Although 
his father wanted him to become a lawyer, Lagrange was attracted to mathematics and astronomy after 
reading a memoir by the astronomer Halley. At age 16 he began to study mathematics on his own and by 
age 19 was appointed to a professorship at the Royal Artillery School in Turin. The following year he 
solved some famous problems using new methods that eventually blossomed into a branch of mathematics 
called the calculus of variations. These methods and Lagrange's applications of them to problems in 
celestial mechanics were so monumental that by age 25 he was regarded by many of his contemporaries as 
the greatest living mathematician. One of Lagrange's most famous works is a memoir, Mecanique 
Analytique , in which he reduced the theory of mechanics to a few general formulas from which all other 
necessary equations could be derived. Napoleon was a great admirer of Lagrange and showered him with 
many honors. In spite of his fame, Lagrange was a shy and modest man. On his death, he was buried with 
honor in the Pantheon. 

[Image: ©SSPL/The Image Works ] 


The main arithmetic properties of the cross product are listed in the next theorem. 




Properties of Cross Product 

If u, v, and w are any vectors in 3-space and k is any scalar, then: 
(a) u x v = - (v x u) 

(ty ux(v + w) = (uxv) + (uxw) 

( c ) (u + v) X w = (u X w) + (v X w) 

(d) £(u x v) = (kv i) x v = u x (kv) 

( e ) ux 0 = 0 xu = 0 

(f) uxu = 0 


The proofs follow immediately from Formula 1 and properties of determinants; for example, part (a) can be proved 
as follows. 

Interchanging u and v in 1 interchanges the rows of the three determinants on the right side of 1 and 
hence changes the sign of each component in the cross product. Thus uxv = -(vxu). 

The proofs of the remaining parts are left as exercises. 

EXAMPLE 3 Standard Unit Vectors 


Consider the vectors 

i= (1,0,0), j= (0,1,0), k= (0,0,1) 

These vectors each have length 1 and lie along the coordinate axes (Figure 3.5.1). They are called the 
standard unit vectors in 3-space. Every vector v = (vj, v 2 , V 3 ) in 3-space is expressible in terms of 
i, j, and k since we can write 

V= (vi, v 2 , V 3 ) =vi(l, 0, 0) +v 2 (0, 1, 0) + v 3 (0, 0, 1) =vii + v 2 j + v 3 k 


For example, 


From 1 we obtain 


(2, — 3, 4) = 2i — 3j 4k 


ixj = 


(0 0 

l 1 0 


1 0 
0 0 


1 0 
0 1 


= (0, 0,1) =k 


k 


f 

1 (0,0, I) 




(i.o. 0) 


j 


y 


( 0 . 1 , 0 ) 


The standard unit vectors 








You should have no trouble obtaining the following results: 

ixi = 0 jx j = 0 k x k = 0 

ix j = k jxk = i kxi = j 

jxi= -k kxj= -i i x k = -j 

Figure 3.5.2 is helpful for remembering these results. Referring to this diagram, the cross product of two 
consecutive vectors going clockwise is the next vector around, and the cross product of two consecutive vectors 
going counterclockwise is the negative of the next vector around. 


k 


j 


Figure 3.5.2 


Determinant Form of Cross Product 

It is also worth noting that a cross product can be represented symbolically in the form 

i j k 


uxv = 


u i U2 U2 
vi V2 V3 


U2 

V2 V 3 


l — 


u\ «3 

vi v 3 


J + 


u 1 U2 

vi v 2 


For example, if u = (1, 2, — 2) and v = (3, 0, 1), then 

i i k 

u x v = l 2 -2 

3 0 1 

which agrees with the result obtained in Example 1. 


= 2 i — 7 j — 6k 


(4) 


WARNING 

It is not true in general that ux(vxw) = (ux v) xw. For example, 

ix (jx j) = i x 0 = 0 

and 

(ixj)xj = kxj= —i 

SO 

» x (j x j) * (i x j) x j 


We know from Theorem 3.5.1 that u x v is orthogonal to both u and v. If u and v are nonzero vectors, it can be 
shown that the direction of u x v can be determined using the following “right-hand rule” (Figure 3.5.3): Let 0 be 












the angle between u and v, and suppose u is rotated through the angle 0 until it coincides with v. If the fingers of 
the right hand are cupped so that they point in the direction of rotation, then the thumb indicates (roughly) the 
direction of u x v- 

II X V 



V 


Figure 3.5.3 

You may find it instructive to practice this rule with the products 

i x j = k, j x k = i, kxi = j 


Geometric Interpretation of Cross Product 

If u and v are vectors in 3-space, then the norm of u x v has a useful geometric interpretation. Lagrange's identity, 
given in Theorem 3.5.1, states that 


||uxv|| 2 = ||u|| 2 ||v|| 2 - (u-v) 2 (5) 

If 0 denotes the angle between u and v, then u v = ||u|| || v|| cos 0 , so 5 can be rewritten as 

lluxvll 2 = ||u|| 2 ||v|| 2 - ||u|| 2 ||v|| 2 cos 2 0 
= l|u|| 2 ||v|| 2 (l-cos 2 0) 

= ||u|| 2 ||v|| 2 sin 2 0 

Since 0 < 0 < tt, it follows that sin 6 > 0, so this can be rewritten as 

lluxvll = HI ||v||sin0 (6) 

But || v|| sin 9 is the altitude of the parallelogram determined by u and v (Figure 3.5.4). Thus, from 6, the area ,4 of 
this parallelogram is given by 

A= (base) (altitude) = ||u||||v||sin0 = ||uxv|| 

This result is even correct if u and v are collinear, since the parallelogram determined by u and v has zero area and 
from 6 we have u x v = 0 because 9=0 in this case. Thus we have the following theorem. 


Area of a Parallelogram 


If, u and v are vectors in 3-space, then ||u x v|| is equal to the area of the parallelogram determined by u 
and v. 


EXAMPLE 4 Area of a Triangle 

Find the area of the triangle determined by the points P\ (2, 2, 0), P 2 ( — 1, 0, 2), and ^3(0, 4, 3). 


The area A of the triangle is -i the area of the parallelogram determined by the vectors 
P±P 2 an d P\P\ (Figure 3.5.5). Using the method discussed in Example 1 of Section 3.1, 

P\P 2 — (—3, -2, 2) an dp^p 3 = (_2, 2, 3)- It follows that 


P { P 2 xP { P 3 =(-\0, 5, -10) 


(verify) and consequently that 


4 = |||PiP 2 xPiP 3 II = ^(15) = -^- 


DEFINITION 2 

If u, v, and w are vectors in 3-space, then 

u • (v x w) 

is called the scalar triple product of u, v, and w. 



Figure 3.5.4 
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Figure 3.5.5 













The scalar triple product of u = (u\, 112 , ^3), v = (vi, V2, V3), and w= (w\, m?2, W3) can be calculated from the 
formula 


u • ( v x w) = 


u 1 U2 
vi V 2 V 3 
w\ W2 W3 


This follows from Formula 4 since 


u • (v x w) 


1 v 2 

v 3 

1^2 

w>3 


vi 

w\ 



VI 

W\ 




V2 

v 3 


VI 

v 3 


x'2 

w 3 

«1 - 

Wl 

w 3 


U 2 + 


vi 

wi 


v 2 

M ?2 


“3 


«1 ^2 ^3 

vi V2 V3 
w\ vt?2 W3 


EXAMPLE 5 Calculating a Scalar Triple Product 

Calculate the scalar triple product u • (v x w) of the vectors 

u = 3i —2j — 5k, v = i + 4j —4k, w=3j + 2k 


(7) 


Solution From 7, 


u • (V X w) 



-4 

2 


+ (— 5) 


1 4 
0 3 


= 60 4-4 — 15 = 49 


The symbol (u - v) x w makes no sense because we cannot form the cross product of a scalar and a 
vector. Thus, no ambiguity arises if we write u • v x w rather than u • (v x w) . However, for clarity we will usually 
keep the parentheses. 


It follows from 7 that 


u • (v x w) = w ■ (u x v) = v ■ (w x u) 


since the 3 x 3 determinants that represent these products can be obtained from one another by two row 
interchanges. (Verify.) These relationships can be remembered by moving the vectors u, v, and w clockwise around 
the vertices of the triangle in Figure 3.5.6. 
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Figure 3.5.6 


Geometric Interpretation of Determinants 

The next theorem provides a useful geometric interpretation of 2 x 2 and 3x3 determinants. 


THEOREM 3.5.4 


(a) The absolute value of the determinant 



u 2 

v 2 


is equal to the area of the parallelogram in 2 -space determined by the vectors u = (u\, 112 ) and 
v = (vi, V 2 ). (See Figure 3.5.7 a) 


(b) The absolute value of the determinant 


det 


u 1 U2 
vi V2 V3 
w\ w2 W3 


is equal to the volume of the parallelepiped in 3-space determined by the vectors u = (u\, 112, 2 * 3 ), 
v = (vi, V2, V3), and w= (w\, m?2, W 3 ). (See Figure 3.5.7 b.) 



Figure 3.5.7 


The key to the proof is to use Theorem 3.5.3. However, that theorem applies to vectors in 3-space, 
whereas u = (ti\, U 2 ) and v = (y\, V 2 ) are vectors in 2-space. To circumvent this “dimension problem,” we will 
view u and v as vectors in the vy-plane of an vyz-coordinate system (Figure 3.5.7 c), in which case these vectors are 
expressed as u = (u\, 112 , 0) and v = (vj, V 2 , 0). Thus 












U X V = 


k 


i j k 

«1 «2 0 

vi V2 0 



det 


«1 
vi 


u 2 

v 2 


It now follows from Theorem 3.5.3 and the fact that ||k|| = 1 that the area A of the parallelogram determined by u 
and v is 


A= ||ux v|| = || det 


«1 i *2 
vi v 2 


k|| 


\u\ 

=Hv! 


u 2 

v 2 


l|k|| = 


det 


«1 U2 

vi v 2 


which completes the proof. 


As shown in Figure 3.5.8, take the base of the parallelepiped determined by u, v, and w to be the 
parallelogram determined by v and w. It follows from Theorem 3.5.3 that the area of the base is ||v x w|| and, as 
illustrated in Figure 3.5.8, the height h of the parallelepiped is the length of the orthogonal projection of u on V xw 
. Therefore, by Formula 12 of Section 3.3, 


A = IIP ro Jvxw u ll = 


|u • (vxw)| 
||vxw|| 


It follows that the volume V of the parallelepiped is 


V = (area of base) • height = ||v x w|| 


|u ■ (v x w) | 
||vxw|| 


u • (v X w) 


so from 7, 


which completes the proof. 
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Figure 3.5.8 


( 8 ) 


If V denotes the volume of the parallelepiped determined by vectors u, v, and w, then it follows from 
Formulas 7 and 8 that 


V = 


volume of parallelepiped 
determined by u, v, and w 


u • (v x w) 


( 9 ) 


From this result and the discussion immediately following Definition 3 of Section 3.2, we can conclude that 

u ■ ( v x w) = ± V 

where the + or - results depending on whether u makes an acute or an obtuse angle with v x w 


























Formula 9 leads to a useful test for ascertaining whether three given vectors lie in the same plane. Since three 

vectors not in the same plane determine a parallelepiped of positive volume, it follows from 9 that 

|u • (v x w) | = 0 if and only if the vectors u, v, and w lie in the same plane. Thus we have the following result. 


THEOREM 3.5.5 


If the vectors u = (u\, U2, ^3), v = (y\, V2, V3), and w = (w\, vi? 2 , W3) have the same initial point, then 
they lie in the same plane if and only if 


u • (v x w) = 


u 1 u 2 
vi V2 V3 
w\ W2 W3 


= 0 


Concept Review 

Cross product of two vectors 
Determinant form of cross product 
Scalar triple product 

Skills 

Compute the cross product of two vectors u and v in 
Know the geometric relationship between u x v t° u and v. 

Know the properties of the cross product (listed in Theorem 3.5.2). 

Compute the scalar triple product of three vectors in 3-space. 

Know the geometric interpretation of the scalar triple product. 

Compute the areas of triangles and parallelograms determined by two vectors or three points in 2-space 
or 3-space. 

Use the scalar triple product to determine whether three given vectors in 3-space are collinear. 


Exercise Set 3.5 

In Exercises 1-2, let u = (3, 2, — 1), v = (0, 2, — 3), and w= (2, 6, 7). Compute the indicated vectors. 

l.(a) vxw 

(b) ux (vxw) 

(c) (uxv) xw 


Answer: 




(a) (32, -6,-4) 

(b ) (- 14, -20, -82) 

(c) (27,40, -42) 

2 . ( a ) (uxv)x(vxw) 

(b) u x (v - 2 w) 

( C ) (ux v) — 2 w 

In Exercises 3-6, use the cross product to find a vector that is orthogonal to both u and v. 

3 . u=( —6,4, 2), v= (3, 1, 5) 

Answer: 

(18,36, -18) 

4 . u=(U, — 2 ), v = ( 2 , - 1 , 2 ) 

5 . u = ( — 2, 1, 5), v = (3, 0, -3) 

Answer: 

(-3,9, -3) 

6 . u = (3, 3, 1), v = (0, 4, 2) 

In Exercises 7-10, find the area of the parallelogram determined by the given vectors u and v. 

7 . u= (1, — 1, 2), v= (0, 3, 1) 

Answer: 

{59 

8 . u=( 3, — 1, 4), v= (6, -2,8) 

9 . u= (2, 3, 0), v = (— 1 , 2, - 2 ) 

Answer: 

{m 

10 . u=(l,l,l),v=( 3 , 2 , -5) 

In Exercises 11-12, find the area of the parallelogram with the given vertices. 

1 L P 1 (1,2), J P 2 (4.4),P3(7, 5),P 4 (4, 3) 

Answer: 

3 

12 . Pi(3, 2),P 2 (5,4),P 3 (9,4),P 4 (7, 2) 

In Exercises 13-14, find the area of the triangle with the given vertices. 

U ,A(2, 0), 5(3,4), C( — 1, 2) 


Answer: 



7 

U .A(\,\),B(2,2),C(3, -3) 

In Exercises 15-16, find the area of the triangle in 3-space that has the given vertices. 

15. Pi(2, 6, -\),P 2 (\,\,\),P 3 (4,6,2) 

Answer: 

i/jm 

2 

16 . P(1, -1.2), <2(0, 3. 4), *(6, 1,8) 

In Exercises 17-18, find the volume of the parallelepiped with sides u, v, and w. 

17 . u = (2, — 6, 2), v = (0,4, — 2), w= (2, 2, -4) 

Answer: 

16 

18. u = (3, 1, 2), v= (4, 5, l),w= (1, 2,4) 

In Exercises 19-20, determine whether u, v, and w lie in the same plane when positioned so that their initial 
points coincide. 

19 . u= (- 1, — 2, 1), v= (3, 0, — 2), w= (5, -4,0) 

Answer: 

The vectors do not lie in the same plane. 

20 . u=( 5, — 2, 1), v = (4, -1, l),w=(l, -1,0) 

In Exercises 21-24, compute the scalar triple product u • (v x w). 

21 . u= (-2,0,6), v= (1, -3,1), w = ( — 5, -1,1) 

Answer: 

-92 

22 . u = ( — 1. 2, 4), v= (3, 4, -2), w=(-l,2,5) 

23 . u= (a, 0, 0), v=(0 , b, 0 ), w= (0, 0,c) 

Answer: 

abc 

24. u= (3, -1,6), v= (2,4, 3), w=(5, -1,2) 

In Exercises 25-26, suppose that u • (v x w) = 3. Find 

25 -(a) u- (wxv) 

(b) (vxw) -u 
( C ) r(uxv) 



Answer: 


(a) “3 

(b) 3 

(c) 3 

26 *(a) v- (uxw) 

(b) (uxw) -v 

(c) v- (wxw) 

(a) Find the area of the triangle having vertices j 4(1, 0, 1), 5(0, 2, 3), and C(2, 1,0). 

(b) Use the result of part (a) to find the length of the altitude from vertex C to side AB. 


Answer: 


(a) ^26_ 

2 

(b) _/26 

3 

28. Use the cross product to find the sine of the angle between the vectors u = (2, 3, — 6 ) and v = (2, 3, 6 ). 

29. Simplify (u + v) x (u — v) . 

Answer: 

2 (v x u) 

30 . Let a = (a\, <*2, ^3), b = (b 1, 62, ^3)> c = ( c l> c l)> an d d = (^L <^2> ^3)- Show that 

(add)* (b x c) = a • (b x c) + d • (b x c) 


31. Let u, v, and w be nonzero vectors in 3-space with the same initial point, but such that no two of them are 
collinear. Show that 

(a) u x (v x w) lies in the plane determined by v and w. 

(b) (uxv) x w lies in the plane determined by u and v. 


32. Prove the following identities. 

(a) ( u + kv) xv = uxv 

(b) u ■ (v x z) = — (u x z) ■ v 


33. Prove: If a, b, c, and d lie in the same plane, then (a x b) x (c x d) = 0. 

34. Prove: If 0 is the angle between u and v and u - v * 0? then tan# = ||u x v|| / (u - v) . 

35. Show that if u, v, and w are vectors in R , no two of which are collinear, then u x (v x w) lies in the plane 
determined by v and w. 

36. it is a theorem of solid geometry that the volume of a tetrahedron is y(area of base) • (height). Use this result 


to prove that the volume of a tetrahedron whose sides are the vectors a, b, and c is 7 - 

0 


(b x c) 


(see the 


accompanying figure). 




Figure Ex-36 

37. Use the result of Exercise 26 to find the volume of the tetrahedron with vertices P , Q , R , S'. 

(a) P(-1,2,0), <2(2,1, — 3), Z?(l, 1, 1), £(3, -2,3) 

(b) P( 0, 0 , 0 ), ( 2 ( 1 . 2 , - 1 ), *(3. 4, 0 ), S( - 1 , - 3,4) 

Answer: 

(a) 1Z 

6 

(b) I 

2 

38. Prove part ( d) of Theorem 3.5.1. [Hint: First prove the result in the case where w= i= (1, 0, 0), then when 

w = j = (0, 1, 0) , and then when w = k = (0, 0, 1). Finally, prove it for an arbitrary vector w = (w \ , m? 2 , W 3 ) 
by writing w = w ii 4- m? 2 ] 4- vt? 3 k.] 

39. Prove part (« e ) of Theorem 3.5.1. [Hint: Apply part (a) of Theorem 3.5.2 to the result in part ( d) of Theorem 
3.5.1.] 

40. Prove: 

(a) Prove (b) of Theorem 3.5.2. 

(b) Prove (c) of Theorem 3.5.2. 

(c) Prove {d) of Theorem 3.5.2. 

(d) Prove (e) of Theorem 3.5.2. 

(e) Prove (/) of Theorem 3.5.2. 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) The cross product of two nonzero vectors u and v is a nonzero vector if and only if u and v are not parallel. 
Answer: 

True 

(b) A normal vector to a plane can be obtained by taking the cross product of two nonzero and noncollinear vectors 
lying in the plane. 

Answer: 

True 

(c) The scalar triple product of u, v, and w determines a vector whose length is equal to the volume of the 
parallelepiped determined by u, v, and w. 


Answer: 


False 

(d) If u and v are vectors in 3-space, then ||v x u|| is equal to the area of the parallelogram determined by u and 
Answer: 

True 

(e) For all vectors u, v, and w in 3-space, the vectors (u x v) xw andux (vxw) are the same. 

Answer: 

False 

(f) If u, v, and w are vectors in g}, where u is nonzero and u x v = u x w ? then v = w. 

Answer: 

False 
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Supplementary Exercises 

1. Let u = ( — 2, 0, 4), v = (3, — 1, 6 ), and w= (2, —5, —5). Compute 

(a) 3v-2u 

(b) ||u + v+w|| 

(c) the distance between _3 U and v | 5w 

(d) P r °Jw u 

(e) u - (v x w)} 

(f) ( — 5v + w) x ((u- v)w) 

Answer: 

(a) 3v — 2u= (13, -3,10) 

(b) ||u + v + w|| = ^70 

(c) ^774 

( d ) proj„,u = ” 27 ( 2 > “ 5 - “ 5 j 

( e ) u • (vxw) = - 122 

(f) ( — 5v 4 - w) x ((u • v)w) = ( — 3150, —2430,1170) 

2. Repeat Exercise 1 for the vectors u = 3i — 5 j 4- k, v = — 2i + 2k, and w = — j + 4k. 

3. Repeat parts (a)-(d) of Exercise 1 for the vectors u = ( — 2, 6 , 2, 1), v = ( — 3, 0, 8 , 0), and 
w= (9, 1, - 6 , - 6 ). 


Answer: 

(a) 3v — 2u = ( — 5, -12,20, -2) 

(b) ||u + v + w|| = ^ 106 

(c) ^2810 

(d) projwU = — ^-(9, 1, -6, -6) 

4. Repeat parts (a)-(d) of Exercise 1 for the vectors u = (0, 5, 0, — 1, — 2), v = (1, —1,6, — 2, 0), and 
w= ( — 4, -1,4, 0,2). 

In Exercises 5-6, determine whether the given set of vectors forms an orthogonal set. If so, normalize each 
vector to form an orthonormal set. 


5. (-32, -1,19), (3, -1,5), (1,6, 2) 


Answer: 


Not an orthogonal set 





6. ( — 2, 0, 1), (1, 1, 2), (1, -5,2) 

(a) The set of all vectors in p? that are orthogonal to a nonzero vector is what kind of geometric object? 

(b) The set of all vectors in £>-' that are orthogonal to a nonzero vector is what kind of geometric object? 

(c) The set of all vectors in that are orthogonal to two noncollinear vectors is what kind of geometric 

object? 

(d) The set of all vectors in R-‘ that are orthogonal to two noncollinear vectors is what kind of geometric 
object? 

Answer: 


(a) A line through the origin, perpendicular to the given vector. 

(b) A plane through the origin, perpendicular to the given vector. 

(c) {0} (the origin) 

(d) A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 


m | y 12 2^ 

Show that vi = I —, —, — | and V 2 = | —, —, — — 1 are orthonormal vectors, and find a third vector V 3 for 
which {vi, V 2 , V 3 ) is an orthonormal set. 

9 9 9 

9. True or False: If u and v are nonzero vectors such that ||u + v|| = ||u||+ || v|| , then u and v are 
orthogonal. 


Answer: 


True 

10. True or False: If u is orthogonal to v | w, then u is orthogonal to v and w. 

11. Consider the points P(3, — 1, 4), Q(6, 0, 2), and £(5, 1,1). Find the point S in whose first 
component is _ 1 and such that PQ is parallel to fig. 


Answer: 

S(-l. -1,5) 

12 . Consider the points P( — 3, 1, 0, 6), (9(0, 5, 1, — 2), and R( — 4, 1, 4, 0). Find the point S in whose 
third component is 6 and such that PQ is parallel to fig. 

13. Using the points in Exercise 11, find the cosine of the angle between the vectors PQ and pfi. 


Answer: 



14. Using the points in Exercise 12, find the cosine of the angle between the vectors PQ and pp. 

15. Find the distance between the point P( — 3, 1,3) and the plane 5x | z = 3y — 4- 


Answer: 



11 

16. Show that the planes 3x —y I 6z = 7 and — 6x I 2j^ — 12z = 1 are parallel, and find the distance 
between the planes. 

In Exercises 17-22, find vector and parametric equations for the line or plane in question. 

17. The plane in that contains the points P( — 2, 1, 3), Q( — 1, —1,1), and P(3, 0, — 2). 

Answer: 

Vector equation: ( x ,y,z) = (-2, l,3)+*i(l, -2, -2)+* 2 (5, -1, -5); 

parametric equations: x= -2 + t\ + 5t 2 , y = \- 2t\-t 2 , z = 3-2t\- 5t 2 

18. The line in that contains the point P{ — 1, 6, 0) and is orthogonal to the plane 4x 

19. The line in that is parallel to the vector v = (8, — 1) and contains the point P(0, 

Answer: 

Vector equation: (x ;> y) = (0, -3) + *(8, - 1); 

parametric equations: * = &, y = — 3 — t 

20. The plane in that contains the point P( — 2, 1,0) and parallel to the plane _8x I — z = 4 • 

21. The line in with equation y = 3x — 5- 

Answer: 

A possible answer is vector equation: (*, y ) = (0, — 5) \ 1,3); parametric equations: 

x=t, y = — 5 + 3* 

22. The plane in fi? with equation 2x — 6y | 3z = 5- 

In Exercises 23-25, find a point-normal equation for the given plane. 

23. The plane that is represented by the vector equation 
(x,y,z) = (— 1, 5, 6) + *i(0, -1,3)+* 2 (2, -1,0). 

Answer: 

3(x + 1) + 6(j> — 5) + 2(z— 6) = 0 

24. The plane that contains the point P( — 5, 1,0) and is orthogonal to the line with parametric equations 
x = 3 — 5t, y = 2t, and z = 7- 

25. The plane that passes through the points P(9, 0, 4), Q( — 1,4, 3), and P(0, 6, — 2). 

Answer: 


—z = 5- 
-3). 


— 18(x — 9) — 51jv — 24(z —4) = 0 



26. Suppose that {vj, V 2 , V 3 } and {wj, W 2 } are two sets of vectors such that V£ and w j are orthogonal for 
all i and j. Prove that if a\, ai, ^ 3 , b\,h >2 are any scalars, then the vectors v = a^vj + a 2 V 2 + a 3 V 3 and 
w = b jwi + b 2 W 2 are orthogonal. 

27. Prove that if two vectors u and v in gy are orthogonal to a nonzero vector w in g}, then u and v are scalar 
multiples of each other. 

28. Prove that ||u + v || = || u || + || v || if and only if u and v are parallel vectors. 

29. The equation Ax*\-By = 0 represents a line through the origin in g 2 if A and B are not both zero. What 
does this equation represent in g} if you think of it as Ax A By 4- Oz = 0? Explain. 

Answer: 

A plane 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 
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INTRODUCTION 

Recall that we began our study of vectors by viewing them as directed line segments 
(arrows). We then extended this idea by introducing rectangular coordinate systems, which 
enabled us to view vectors as ordered pairs and ordered triples of real numbers. As we 
developed properties of these vectors we noticed patterns in various formulas that enabled 
us to extend the notion of a vector to an ^-tuple of real numbers. Although w-tuples took 
us outside the realm of our “visual experience,” it gave us a valuable tool for 
understanding and studying systems of linear equations. In this chapter we will extend the 
concept of a vector yet again by using the most important algebraic properties of vectors 
in R n as axioms. These axioms, if satisfied by a set of objects, will enable us to think of 
those objects as vectors. 
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4.1 Real Vector Spaces 

In this section we will extend the concept of a vector by using the basic properties of vectors in R n as axioms, which if satisfied 
by a set of objects, guarantee that those objects behave like familiar vectors. 


Vector Space Axioms 

The following definition consists often axioms, eight of which are properties of vectors in R n that were stated in Theorem 3.1.1. 
It is important to keep in mind that one does not prove axioms; rather, they are assumptions that serve as the starting point for 
proving theorems. 

Vector space scalars can be real numbers or complex 
numbers. Vector spaces with real scalars are called real 
vector spaces and those with complex scalars are called 
complex vector spaces. For now we will be concerned 
exclusively with real vector spaces. We will consider 
complex vector spaces later. 


n 


DEFINITION 1 

Let Fbe an arbitrary nonempty set of obj ects on which two operations are defined: addition, and multiplication by 
scalars. By addition we mean a rule for associating with each pair of objects u and v in F an object u \ v? called the 
sum of u and v; by scalar multiplication we mean a rule for associating with each scalar k and each object u in Fan 
object ku, called the scalar multiple of u by k. If the following axioms are satisfied by all objects u, v, w in F and all 
scalars k and m , then we call F a vector space and we call the objects in V vectors. 

1. If u and v are objects in F, then u 4- v is in F 

2. u-fv = v + u 

3 # u 4- (v + w) = (u 4- v) + w 

4. There is an object 0 in F, called a zero vector for F, such that 0 -f u = u 4= 0 = u for all u in F. 

5. For each u in F, there is an object _ u in K called a negative of u, such that u + ( — u) = ( = u)+u = 0. 

6. If k is any scalar and u is any object in F, then ku is in F. 

7 # k (u 4- v) = ku 4- kv 

8. (k 4 - m )u = ku. 4= rau 

9. k(m\i) = (km)(\i) 

10 . lu = u 


Observe that the definition of a vector space does not specify the nature of the vectors or the operations. Any kind of object can 
be a vector, and the operations of addition and scalar multiplication need not have any relationship to those on R n . The only 
requirement is that the ten vector space axioms be satisfied. In the examples that follow we will use four basic steps to show 
that a set with two operations is a vector space. 

r n 


To Show that a Set with Two Operations is a Vector Space 

Step 1 Identify the set F of objects that will become vectors. 


Step 2 Identify the addition and scalar multiplication operations on V. 

Step 3 Verify Axioms 1 and 6; that is, adding two vectors in V produces a vector in V, and multiplying a vector in Vby 
a scalar also produces a vector in V. Axiom 1 is called closure under addition , and Axiom 6 is called closure under 
scalar multiplication. 

Step 4 Confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. 


J 



Hermann Gunther Grassmann (1809-1877) 

The notion of an “abstract vector space” evolved over many years and had many contributors. The 
idea crystallized with the work of the German mathematician H. G. Grassmann, who published a paper in 1862 in which 
he considered abstract systems of unspecified elements on which he defined formal operations of addition and scalar 
multiplication. Grassmann’s work was controversial, and others, including Augustin Cauchy (p. 137), laid reasonable 
claim to the idea. 

[Image: (c)Sueddeutsche Zeitung Photo/The Image Works] 


Our first example is the simplest of all vector spaces in that it contains only one object. Since Axiom 4 requires that every 
vector space contain a zero vector, the object will have to be that vector. 

EXAMPLE 1 The Zero Vector Space 

Let V consist of a single object, which we denote by 0, and define 

0 + 0 = 0 and £0 = 0 

for all scalars k. It is easy to check that all the vector space axioms are satisfied. We call this the zero vector 
space. 


Our second example is one of the most important of all vector spaces—the familiar space R n . It should not be surprising that 
the operations on R n satisfy the vector space axioms because those axioms were based on known properties of operations on R n 


EXAMPLE 2 R n Is a Vector Space 

Let V = R n , and define the vector space operations on V to be the usual operations of addition and scalar 
multiplication of w-tuples; that is, 

u + v = (u\,U2, —,« M ) + (vi, V 2 ,v„) = (hi + vi, + V 2 , u n + v„) 

hi = (ku\, kii2, hi n ) 

The set V = R n is closed under addition and scalar multiplication because the foregoing operations produce 





^-tuples as their end result, and these operations satisfy Axioms 2, 3, 4, 5, 7, 8, 9, and 10 by virtue of Theorem 
3.1.1. 


Our next example is a generalization of R n in which we allow vectors to have infinitely many components. 

EXAMPLE 3 The Vector Space of Infinite Sequences of Real Numbers 

Let V consist of objects of the form 

in which u\, 112 , u n ,.... is an infinite sequence of real numbers. We define two infinite sequences to be equal if 

their corresponding components are equal, and we define addition and scalar multiplication componentwise by 

u + v = (ki,« 2.+ (vi,v 2 . V„...) 

= (tti+v 1 ,a 2 + v 2 . «m + v m ,—) 

ku = (ku\, ku 2 , ...) 

We leave it as an exercise to confirm that V with these operations is a vector space. We will denote this vector 
space by the symbol R x . 


In the next example our vectors will be matrices. This may be a little confusing at first because matrices are composed of rows 
and columns, which are themselves vectors (row vectors and column vectors). However, here we will not be concerned with the 
individual rows and columns but rather with the properties of the matrix operations as they relate to the matrix as a whole. 

Note that Equation 1 involves three different addition 
operations: the addition operation on vectors, the 
addition operation on matrices, and the addition 
operation on real numbers. 


EXAMPLE 4 A Vector Space of 2 x 2 Matrices 


Let Fbe the set of 2 x 2 matrices with real entries, and take the vector space operations on V to be the usual 
operations of matrix addition and scalar multiplication; that is, 


u 4- v = 


“11 “12 
“21 “22 


vil V12 
V21 v 22 


“11 +vn 
“21 +V 21 


W 12 +V 12 
^22 + V 22 


'«11 «12' 


ten tei 2 

u 2l u 22 


te 2 i te 22 


( 1 ) 


The set V is closed under addition and scalar multiplication because the foregoing operations produce 2x2 
matrices as the end result. Thus, it remains to confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. Some of these 
are standard properties of matrix operations. For example, Axiom 2 follows from Theorem 1.4.1a since 


u + v = 


“11 “12 
“21 “22 


vil V12 
V 21 v 22 


"vil 

vn 

4- 

■«ii 

“12 

V 21 

v 22_ 

“21 

“22_ 


Similarly, Axioms 3, 7, 8, and 9 follow from parts ( b ), (h), (/), and ( e ), respectively, of that theorem (verify). This 
leaves Axioms 4, 5, and 10 that remain to be verified. 


To confirm that Axiom 4 is satisfied, we must find a 2 x 2 matrix 0 in V for which u 4= 0 = 0 -h u for all 2 x 2 
matrices in V. We can do this by taking 

„ |"0 01 






















With this definition, 


0 + u 


0 0 
0 0 


’“11 

“ 12 " 


’“11 

“ 12 ‘ 

“21 

“22 _ 


“21 

“22 _ 


and similarly u + 0 = u - To verify that Axiom 5 holds we must show that each object u in Fhas a negative in 

V such that u + ( — u) = 0 and ( — u) + u = 0. This can be done by defining the negative of u to be 


““11 ““12 
““21 ““22 


With this definition, 

»+(-») = [“ 

and similarly (-u) + u = 0. Finally, Axiom 10 holds because 


hi 

“21 


“12' 

+ 

'-“11 

~“12’ 


'0 

O' 

“22 


-“21 

-“22 


_0 

0_ 


lu = 1 


"“11 

“ 12 ' 


'“11 

“12' 

“21 

“22 _ 


“21 

“22 _ 


= 0 


EXAMPLE 5 The Vector Space of m x n Matrices 

Example 4 is a special case of a more general class of vector spaces. You should have no trouble adapting the 
argument used in that example to show that the set V of all m x n matrices with the usual matrix operations of 
addition and scalar multiplication is a vector space. We will denote this vector space by the symbol M mn . Thus, 
for example, the vector space in Example 4 is denoted as M22- 


In Example 6 the functions were defined on the entire 
interval ( — oo , oo ). However, the arguments used in 
that example apply as well on all subin-tervals of 
( — oo , oo ), such as a closed interval [a, b ] or an open 
interval (a, b). We will denote the vector spaces of 
functions on these intervals by F\a, b\ and F(a, b), 
respectively. 

EXAMPLE 6 The Vector Space of Real-Valued Functions 

Let Fbe the set of real-valued functions that are defined at each x in the interval ( — oo , oo ). If f = / (x) and 
g = g(x) are two functions in V and if k is any scalar, then define the operations of addition and scalar 
multiplication by 


( f +g) 00 =/(*)+*(*) (2) 

(*f)00=*/(x) (3) 

One way to think about these operations is to view the numbers/(*) and g(x) as “components” of f and g at the 
point x, in which case Equations 2 and 3 state that two functions are added by adding corresponding components, 
and a function is multiplied by a scalar by multiplying each component by that scalar—exactly as in R n and R 00 . 
This idea is illustrated in parts ( a ) and (b) of Figure 4.1.1. The set V with these operations is denoted by the 
symbol F( — oo , oo ). We can prove that this is a vector space as follows: 



















Axioms 1 and 6 These closure axioms require that if we add two functions that are defined at each x in the 
interval (— oo , 00 ), then sums and scalar multiples of those functions are also defined at each x in the interval 
(= 00 , 00 ). This follows from Formulas 2 and 3. 

Axiom 4 This axiom requires that there exists a function 0 in F ( — 00 , 00 ), which when added to any other 
function f in F ( — 00 , 00 ) produces f back again as the result. The function, whose value at every point x in the 
interval (— 00 , oo ) is zero, has this property. Geometrically, the graph of the function 0 is the line that 
coincides with the x-axis. 

Axiom 5 This axiom requires that for each function fin F ( — 00 , 00 ) there exists a function — f in 

F( — 00 , 00 ), which when added to f produces the function 0. The function defined by — f (x)= — / (x) has 

this property. The graph of _f can be obtained by reflecting the graph of f about the x-axis (Figure 4.1.1c). 

Axioms 2,3,7,8,9,10 The validity of each of these axioms follows from properties of real numbers. For example, 
if f and g are functions in F ( — 00 , 00 ), then Axiom 2 requires that f + g = g + f. This follows from the 
computation 

(f + g) (X) = f (x) + g(x) = g(x) + f OO = (g + f) (X) 

in which the first and last equalities follow from 2, and the middle equality is a property of real numbers. We will 
leave the proofs of the remaining parts as exercises. 



f 

0 

-f 


/« 

-/to 


(c) 


It is important to recognize that you cannot impose any two operations on any set V and expect the vector space axioms to hold. 
For example, if V is the set of 22 -tuples withpositive components, and if the standard operations from R n are used, then V is not 
closed under scalar multiplication, because if u is a nonzero / 2 -tuple in V, then ( — l)u has at least one negative component and 
hence is not in V. The following is a less obvious example in which only one of the ten vector space axioms fails to hold. 

EXAMPLE 7 A Set That Is Not a Vector Space 

Let y = p and define addition and scalar multiplication operations as follows: If u = («i, U 2 ) and v = (vj. V 2 ) 

, then define 

u + v= (u\ 4= vi, U 2 4- V 2 ) 

and if k is any real number, then define 

•hi = (tei, 0) 

For example, if u— (2, 4), v = (—3, 5), and £ = 7? then 

u + v=(2 + (-3),4 + 5) = (-l,9) 
te = 7u = (7 • 2, 0) = (14, 0) 

The addition operation is the standard one from but the scalar multiplication is not. In the exercises we will 
ask you to show that the first nine vector space axioms are satisfied. However, Axiom 10 fails to hold for certain 
vectors. For example, if u = (ti\, U 2 ) is such that U 2 * 0, then 

lu= l(«i, U 2 ) = (1 ■ u\ 7 0) = (u\, 0) *11 
Thus, V is not a vector space with the stated operations. 











Our final example will be an unusual vector space that we have included to illustrate how varied vector spaces can be. Since the 
objects in this space will be real numbers, it will be important for you to keep track of which operations are intended as vector 
operations and which ones as ordinary operations on real numbers. 

EXAMPLE 8 An Unusual Vector Space 

Let Fbe the set of positive real numbers, and define the operations on V to be 

u + v = uv [Vector addition is numerical multiplication. ] 

ku = u * [Sc alar multiplic ation is numeric al exp onentiation. ] 

2 

Thus, for example, 1 + 1 = 1 and (2) (1) = 1 = 1—strange indeed, but nevertheless the set V with these 

operations satisfies the 10 vector space axioms and hence is a vector space. We will confirm Axioms 4, 5, and 7, 
and leave the others as exercises. 

Axiom 4—The zero vector in this space is the number 1 (i.e., 0=1) since 

u ^\=u• \ =u 

Axiom 5—The negative of a vector u is its reciprocal (i.e., — u = 1 / u) since 

• Axiom 7— k(u 4- v) = (uv) k = u*v k = (ku) + (£v) 


Some Properties of Vectors 

The following is our first theorem about general vector spaces. As you will see, its proof is very formal with each step being 
justified by a vector space axiom or a known property of real numbers. There will not be many rigidly formal proofs of this type 
in the text, but we have included these to reinforce the idea that the familiar properties of vectors can all be derived from the 
vector space axioms. 


THEOREM 4.1.1 

Let Lbe a vector space, u a vector in V, and k a scalar; then: 

(a) 0u = 0 

(b) to = 0 

(c) (-!)«= -« 

(d) If £u = 0, then k = 0 or u = 0- 


We will prove parts {a) and ( c ) and leave proofs of the remaining parts as exercises. 
We can write 


Ou + Ou = (0 4- 0)u [ Axiom 8 ] 

= Ou [Property of the number 0 ] 


By Axiom 5 the vector Ou has a negative, — Ou- Adding this negative to both sides above yields 

[0u-|- Ou] + (—Ou) = 0u+ (“Ou) 
or 

0u+ [0u+ ( — Ou)] = 0u+ (—Ou) [Axiom 3] 

Ou + 0 = 0 [Axiom 5 ] 

0u = 0 [Axiom 4] 


To prove that ( = 1 )u = -u, we must show that u+( = l)u = 0. The proof is as follows: 


u+( —l)u = lu+( —l)u 
= (1 + (“l))u 
= Ou 

= 0 


[Axiom 10] 

[Axiom 8] 

[Property of numbers] 
[Part (a) of this theorem] 


A Closing Observation 

This section of the text is very important to the overall plan of linear algebra in that it establishes a common thread between 
such diverse mathematical objects as geometric vectors, vectors in R n , infinite sequences, matrices, and real-valued functions, 
to name a few. As a result, whenever we discover a new theorem about general vector spaces, we will at the same time be 
discovering a theorem about geometric vectors, vectors in R n , sequences, matrices, real-valued functions, and about any new 
kinds of vectors that we might discover. 

To illustrate this idea, consider what the rather innocent-looking result in part (a) of Theorem 4.1.1 says about the vector space 
in Example 8. Keeping in mind that the vectors in that space are positive real numbers, that scalar multiplication means 
numerical exponentiation, and that the zero vector is the number 1, the equation 

0u = 0 

is a statement of the fact that if u is a positive real number, then 

u°=\ 


Concept Review 

Vector space 

Closure under addition 

Closure under scalar multiplication 

Examples of vector spaces 

Skills 

Determine whether a given set with two operations is a vector space. 

Show that a set with two operations is not a vector space by demonstrating that at least one of the vector space axioms 
fails. 


Exercise Set 4.1 


1. Let Fbe the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 
onu= (u\, U 2 ) and v= (vj, V 2 ): 

u-f v= 4- v\, U 2 + V 2 ), &u=(0,&W2) 

(a) Compute u -h v and ku for u = ( — 1, 2), v = (3,4) and k = 3- 

(b) In words, explain why V is closed under addition and scalar multiplication. 

(c) Since addition on V is the standard addition operation on £ 2 , certain vector space axioms hold for V because they are 
known to hold for g?. Which axioms are they? 

(d) Show that Axioms 7, 8, and 9 hold. 

(e) Show that Axiom 10 fails and hence that V is not a vector space under the given operations. 

Answer: 

(a) u-h v = (2, 6), 3u = (0, 6) 

(c) Axioms 1-5 

2. Let Fbe the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 
onu= (u\, 112 ) and v= (vi, V 2 ): 

u-f v= (u\ + vi + 1, U 2 4= V 2 + 1), £u=(fei,te 2 ) 

(a) Compute u | v and ku for u = (0, 4), v = (1, — 3), and k = 2- 

(b) Show that (0, 0) * 0. 

(c) Show that ( — 1, — 1) = 0. 

(d) Show that Axiom 5 holds by producing an ordered pair _ u such that u + (— 11 ) = 0 for u = (ti\, 112 ) . 

(e) Find two vector space axioms that fail to hold. 

In Exercises 3-12, determine whether each set equipped with the given operations is a vector space. For those that are not 
vector spaces identify the vector space axioms that fail. 

3. The set of all real numbers with the standard operations of addition and multiplication. 

Answer: 

The set is a vector space with the given operations. 

4. The set of all pairs of real numbers of the form (x, 0) with the standard operations on 

5. The set of all pairs of real numbers of the form (x, y), where x > 0, with the standard operations on g}. 

Answer: 

Not a vector space, Axioms 5 and 6 fail. 

6. The set of all / 2 -tuples of real numbers that have the form (x, x,x) with the standard operations on R n . 

7. The set of all triples of real numbers with the standard vector addition but with scalar multiplication defined by 

k{x , y, z) = (k 2 x , k 2 y, k 2 z^j 

Answer: 

Not a vector space. Axiom 8 fails. 

8. The set of all 2 x 2 invertible matrices with the standard matrix addition and scalar multiplication. 

9. The set of all 2 x 2 matrices of the form 

r« 01 



with the standard matrix addition and scalar multiplication. 

Answer: 

The set is a vector space with the given operations. 

10. The set of all real-valued functions/defined everywhere on the real line and such that / (1) = 0 with the operations used in 
Example 6. 

11. The set of all pairs of real numbers of the form (1, x) with the operations 

(1.7) + (1./)= (1.7+/) and*(1,7) = (1,*7) 

Answer: 

The set is a vector space with the given operations. 

12. The set of polynomials of the form aq 4- with the operations 

(ao+tfix) + (&0 + &1*) = (tfo +i>o) + 0*1 +^l)* 

and 

k(a$ +a\x) = (fo*o) + (ka\)x 

13. Verify Axioms 3, 7, 8, and 9 for the vector space given in Example 4. 

14. Verify Axioms 1, 2, 3, 7, 8, 9, and 10 for the vector space given in Example 6. 

15. With the addition and scalar multiplication operations defined in Example 7, show that y — $} satisfies Axioms 1-9. 

16. Verify Axioms 1, 2, 3, 6, 8, 9, and 10 for the vector space given in Example 8. 

17. Show that the set of all points in g} lying on a line is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the line passes through the origin. 

18. Show that the set of all points in p} lying in a plane is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the plane passes through the origin. 

In Exercises 19-21, prove that the given set with the stated operations is a vector space. 

19. The set V — {0} with the operations of addition and scalar multiplication given in Example 1. 

20. The set R x of all infinite sequences of real numbers with the operations of addition and scalar multiplication given in 
Example 3. 

21. The set M mn of all ^ x n matrices with the usual operations of addition and scalar multiplication. 

22. Prove part ( d) of Theorem 4.1.1. 

23. The argument that follows proves that if u, v, and w are vectors in a vector space V such that u | w = v | then u = v 
(the cancellation law for vector addition). As illustrated, justify the steps by filling in the blanks. 

u + w = v 4- w Hypothesis 

(u + w) + (—w) = (v 4- w) 4- (—w) Add—w to both sides. 

u4- [w4- (—w)] = v + [w-b (—w)] _ 

u T 0 = v 4- 0 _ 

u = v _ 

24. Let v be any vector in a vector space V. Prove that Qv = 0 

25. Below is a seven-step proof of part ( b ) of Theorem 4.1.1. Justify each step either by stating that it is true by hypothesis or by 
specifying which of the ten vector space axioms applies. 

Hypothesis: Let u be any vector in a vector space V, let 0 be the zero vector in V, and let k be a scalar. 


Conclusion: Then ^0 = 0- 



Proof: 


(1) A0 + Au = £(0 + u 

(2) =ku 

(3) Since An is in V, -An is in V. 

(4) Therefore, (AD + An + (-An = An + (-An). 

(5) AO + (An + (-An)) = An + (-An) 

(6) AO + 0 = 0 

(7) AO = 0 

26. Let v be any vector in a vector space V. Prove that — v = (— l)v. 

27. Prove: If u is a vector in a vector space V and k a scalar such that An = 0? then either k = 0 or u = Q. [ Suggestion : Show 
that if An = 0 and k =£ 0? then u = 0- The result then follows as a logical consequence of this.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) A vector is a directed line segment (an arrow). 

Answer: 

False 

(b) A vector is an / 2 -tuple of real numbers. 

Answer: 

False 

(c) A vector is any element of a vector space. 

Answer: 

True 

(d) There is a vector space consisting of exactly two distinct vectors. 

Answer: 

False 

(e) The set of polynomials with degree exactly 1 is a vector space under the operations defined in Exercise 12. 

Answer: 

False 
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4.2 Subspaces 

It is possible for one vector space to be contained within another. We will explore this idea in this section, we 
will discuss how to recognize such vector spaces, and we will give a variety of examples that will be used in 
our later work. 

We will begin with some terminology. 


DEFINITION 1 

A subset Ik of a vector space V is called a subspace of V if W is itself a vector space under the addition 
and scalar multiplication defined on V. 


In general, to show that a nonempty set W with two operations is a vector space one must verify the ten vector 
space axioms. However, if Ik is a subspace of a known vector space k, then certain axioms need not be verified 
because they are “inherited” from V. For example, it is not necessary to verify that u ) v = v | u holds in W 
because it holds for all vectors in V including those in Ik. On the other hand, it is necessary to verify that W is 
closed under addition and scalar multiplication since it is possible that adding two vectors in W or multiplying a 
vector in Ikby a scalar produces a vector in V that is outside of Ik (Figure 4.2.1). 



The vectors u and v are in Ik, but the vectors u | v and ki i are not 

Those axioms that are not inherited by W are 

Axiom 1—Closure of W under addition 

Axiom 4—Existence of a zero vector in Ik 

Axiom 5—Existence of a negative in Ik for every vector in W 

Axiom 6—Closure of W under scalar multiplication 

so these must be verified to prove that it is a subspace of V. However, the following theorem shows that if 
Axiom 1 and Axiom 6 hold in Ik, then Axioms 4 and 5 hold in Ik as a consequence and hence need not be 
verified. 


THEOREM 4.2.1 


If W is a set of one or more vectors in a vector space F, then IF is a subspace of F if and only if the 
following conditions hold. 

(a) If u and v are vectors in W\ then u | v is in IF. 

(b) If k is any scalar and u is any vector in W , then ki i is in IF. 

In words, Theorem 4.2.1 states that W is a 
subspace of F if and only if it is closed under 
addition and scalar multiplication. 

If IF is a subspace of F, then all the vector space axioms hold in IF, including Axioms 1 and 6, which 
are precisely conditions {a) and ( b ). 

Conversely, assume that conditions (a) and ( b ) hold. Since these are Axioms 1 and 6, and since Axioms 2, 3, 7, 
8, 9, and 10 are inherited from F, we only need to show that Axioms 4 and 5 hold in IF. For this purpose, let u 
be any vector in IF. It follows from condition ( b ) that ku is a vector in W for every scalar k. In particular, 

Qu = 0 and (“l) u = — u are in IF, which shows that Axioms 4 and 5 hold in IF. 

Note that every vector space has at least two 
subspaces, itself and its zero subspace. 


EXAMPLE 1 The Zero Subspace 

If V is any vector space, and if W= {0} is the subset of V that consists of the zero vector only, 
then W is closed under addition and scalar multiplication since 

0 | 0 = 0 and £0 = 0 

for any scalar k. We call W the zero subspace of V. 


EXAMPLE 2 Lines Through the Origin Are Subspaces of R 2 and of R 3 

If IF is a line through the origin of either p/ or then adding two vectors on the line W or multiplying ; 
on the line IF by a scalar produces another vector on the line IF, so W is closed under addition and scalar 
multiplication (see Figure 4.2.2 for an illustration in p-'). 




(a) W is closed under addition. ( b ) W is closed under scalar 

multiplication. 


Figure 4.2.2 


EXAMPLE 3 Planes Through the Origin AreSubspaces of f? 3 

If u and v are vectors in a plane W through the origin of R then it is evident geometrically that u | v 
and ku lie in the same plane W for any scalar k (Figure 4.2.3). Thus W is closed under addition and 
scalar multiplication. 



The vectors u | v and ku both lie in the same plane as u and v 

Table 1 that follows gives a list of subspaces of r} and of R-' that we have encountered thus far. We will see 
later that these are the only subspaces of r} and of R-\ 


Table 1 


Subspaces of/? 2 

Subspaces of/? 3 

* {0} 

* {0} 

• Lines through the origin 

• Lines through the origin 

• R 2 

• Planes through the origin 


• R 3 


EXAMPLE 4 A Subset of R 2 That Is Not a Subspace 












Let Wbc the set of all points (x, y) in for which x > 0 and y > 0 (the shaded region in Figure 
4.2.4). This set is not a subspace of R~ because it is not closed under scalar multiplication. For 
example, v=(l, l)isa vector in W, but (— 1) v = ( — 1, — 1) is not. 


>' 

iv 


(l. I) 


X 


-► 


(- 1,-0 


W is not closed under scalar multiplication 


EXAMPLE 5 Subspaces of M n n 

We know from Theorem 1.7.2 that the sum of two symmetric nxn matrices is symmetric and 
that a scalar multiple of a symmetric n x n matrix is symmetric. Thus, the set of symmetric nxn 
matrices is closed under addition and scalar multiplication and hence is a subspace of M nn . 
Similarly, the sets of upper triangular matrices, lower triangular matrices, and diagonal matrices 
are subspaces of M nn . 


EXAMPLE 6 A Subset of Mnn That Is Not a Subspace 


The set W of invertible nxn matrices is not a subspace of M nn , failing on two counts—it is not 
closed under addition and not closed under scalar multiplication. We will illustrate this with an 
example in M 22 that you can readily adapt to M nn . Consider the matrices 


U = 


1 2 
2 5 


and V = 


-1 2 

-2 5 


The matrix Of/ is the 2 x 2 zero matrix and hence is not invertible, and the matrix JJ \ V has a 
column of zeros, so it also is not invertible. 


CALCULUS REQUIRED 

EXAMPLE 7 The Subspace C(-°°, *>) 

There is a theorem in calculus which states that a sum of continuous functions is continuous and 
that a constant times a continuous function is continuous. Rephrased in vector language, the set 
of continuous functions on (— 00 , 00 ) is a subspace of F (— 00 , 00 ). We will denote this 








subspace by C( — oo , oo ). 


CALCULUS REQUIRED 

EXAMPLE 8 Functions with Continuous Derivatives 

A function with a continuous derivative is said to be continuously differentiable. There is a 
theorem in calculus which states that the sum of two continuously differentiable functions is 
continuously differentiable and that a constant times a continuously differentiable function is 
continuously differentiable. Thus, the functions that are continuously differentiable on 
( — oo, oo ) form a subspace of F (— oo , oo ). We will denote this subspace by 
C* (— oo , oo ), where the superscript emphasizes that the first derivative is continuous. To take 

this a step further, the set of functions with m continuous derivatives on ( — oo , oo ) is a 
subspace of F ( — oo , oo ) as is the set of functions with derivatives of all orders on 
( — oo , oo ). We will denote these subspaces by C m ( — oo , oo) and C “ (— oo , oo ), 
respectively. 


EXAMPLE 9 The Subspace of All Polynomials 

Recall that a polynomial is a function that can be expressed in the form 

p(x) =< 2 o + a\x + • • • +a„x n (1) 

where aQ,a\, ■ • • , a n arc constants. It is evident that the sum of two polynomials is a 
polynomial and that a constant times a polynomial is a polynomial. Thus, the set W of all 
polynomials is closed under addition and scalar multiplication and hence is a subspace of 
F ( — oo , oo ). We will denote this space by P^. 


EXAMPLE 10 The Subspace of Polynomials of Degree < n 

Recall that the degree of a polynomial is the highest power of the variable that occurs with a 
nonzero coefficient. Thus, for example, if a n ^0 in Formula 1, then that polynomial has degree n. 
It is not true that the set W of polynomials with positive degree n is a subspace of F ( — oo , oo ) 
because that set is not closed under addition. For example, the polynomials 

1 + 2x 4- 3x 2 and 5 + 7x — 3x 2 

both have degree 2, but their sum has degree 1. What is true, however, is that for each nonnegative 
integer n the polynomials of degree n or less form a subspace of F (— oo , oo ). We will denote 
this space by P n . 


In this text we regard all constants to be 
polynomials of degree zero. Be aware, however, 
that some authors do not assign a degree to the 
constant 0. 


The Hierarchy of Function Spaces 

It is proved in calculus that polynomials are continuous functions and have continuous derivatives of all orders 
on (— oo , og ). Thus, it follows that is not only a subspace of F (— oo , oo ), as previously observed, but 
is also a subspace of C v (— oo , oo ). We leave it for you to convince yourself that the vector spaces 
discussed in Example 7 to Example 10 are “nested” one inside the other as illustrated in Figure 4.2.5. 


C°°( <*>. oo) 

C m (-oo, oo) 
C l (-oo, oo) 

C(-oo. oo) 

/’*(~oo 4 oo) 


Figure 4.2.5 


In our previous examples, and as illustrated in Figure 4.2.5, we have only considered functions that 
are defined at all points of the interval (— oo , oo ). Sometimes we will want to consider functions that are 
only defined on some subinterval of ( — oo , oo ), say the closed interval [a, b] or the open interval (a, b ). In 
such cases we will make an appropriate notation change. For example, C[a, b\ is the space of continuous 
functions on [a, b\ and C(a, b ) is the space of continuous functions on (a, b). 


Building Subspaces 

The following theorem provides a useful way of creating a new subspace from known subspaces. 


THEOREM 4.2.2 

lfW\, Wj ,.... W r are subspaces of a vector space V, then the intersection of these subspaces is also a 
subspace of V. 


Note that the first step in proving Theorem 4.2.2 
was to establish that W contained at least one 
vector. This is important, for otherwise the 
subsequent argument might be logically correct 
but meaningless. 


Let W be the intersection of the subspaces W\, Wj, ..., W r . This set is not empty because each of these 
subspaces contains the zero vector of V, and hence so does their intersection. Thus, it remains to show that W is 
closed under addition and scalar multiplication. 

To prove closure under addition, let u and v be vectors in W. Since W is the intersection of IY\, W 2 , ..., W r , it 
follows that u and v also lie in each of these subspaces. Since these subspaces are all closed under addition, 
they all contain the vector u | v and hence so does their intersection W. This proves that W is closed under 
addition. We leave the proof that W is closed under scalar multiplication to you. 

Sometimes we will want to find the “smallest” subspace of a vector space V that contains all of the vectors in 
some set of interest. The following definition, which generalizes Definition 4 of Section 3.1, will help us to do 
that. 


If £• = 1, then Equation 2 has the form 
w = &ivi, in which case the linear combination 
is just a scalar multiple of vj. 


DEFINITION 2 

If w is a vector in a vector space V, then w is said to be a linear combination of the vectors 
vj, v 2 ,.... v r in Fif w can be expressed in the form 

w = *ivi + * 2 v 2 + ’ - • +£yv> (2) 

where k[, kj, arc scalars. These scalars are called the coefficients of the linear combination. 


THEOREM 4.2.3 

If S' = (wi, w 2 ,.... w r } is a nonempty set of vectors in a vector space F, then: 

(a) The set W of all possible linear combinations of the vectors in S' is a subspace of V. 

(b) The set W in part (a) is the “smallest” subspace of V that contains all of the vectors in S in the sense 
that any other subspace that contains those vectors contains W. 


Let Wbe the set of all possible linear combinations of the vectors in S. We must show that S is 
closed under addition and scalar multiplication. To prove closure under addition, let 

u = c jwi + C2 w 2 + * • • + c r vr r and v = £pwi + &2W2 + ■ ■ ■ + k r w r 
be two vectors in S. It follows that their sum can be written as 

u + v= (ci +jfci)wi + (C 2 + ^ 2) w 2 + ’ ’ ’ + (c,. + £ r )w r 

which is a linear combination of the vectors in S. Thus, W is closed under addition. We leave it for you to prove 
that W is also closed under scalar multiplication and hence is a subspace of V. 

Proof (b) Let W be any subspace of V that contains all of the vectors in S. Since W is closed under addition 
and scalar multiplication, it contains all linear combinations of the vectors in S and hence contains W. 


The following definition gives some important notation and terminology related to Theorem 4.2.3. 


DEFINITION 3 

The subspace of a vector space V that is formed from all possible linear combinations of the vectors in 
a nonempty set S is called the span of S, and we say that the vectors in S span that subspace. If 
S = {wj, W 2 ,..w r } , then we denote the span of S by 

span{wi, W 2 ,w r } or span (S') 


EXAMPLE 11 The Standard Unit Vectors Span R n 

Recall that the standard unit vectors in R n are 

ei = (1. 0, 0,.... 0), e 2 = (0, 1. 0,.... 0). e„ = (0. 0, 0, ...1) 

These vectors span R n since every vector v = (vi, V 2 .v M ) in R n can be expressed as 

v = v l e l +v 2 e 2 + • • • +v„e„ 

which is a linear combination of ei, e2,.... e M . Thus, for example, the vectors 

i= (1.0.0), j= (0,1.0), k= (0,0,1) 

span R-' since every vector v = (a, b, c) in this space can be expressed as 

v = (a, b, c ) =<*(!, 0, 0) +&(0, 1, 0) +c(0, 0, 1) =ai + i>j + ck 


EXAMPLE 12 A Geometric View of Spanning in R 2 and R 3 

(a) if v is a nonzero vector in r} or R that has its initial point at the origin, then spanjv}, which 
is the set of all scalar multiples of v, is the line through the origin determined by v. You should 
be able to visualize this from Figure 4.2.6a by observing that the tip of the vector k\ can be 
made to fall at any point on the line by choosing the value of k appropriately. 



George William Hill (1838-1914) 

The terms linearly independent and linearly dependent were 
introduced by Maxime Bocher (see p. 7) in his book Introduction to Higher Algebra, 
published in 1907. The term linear combination is due to the American mathematician 
G. W. Hill, who introduced it in a research paper on planetary motion published in 
1900. Hill was a “loner” who preferred to work out of his home in West Nyack, New 
York, rather than in academia, though he did try lecturing at Columbia University for a 
few years. Interestingly, he apparently returned the teaching salary, indicating that he 
did not need the money and did not want to be bothered looking after it. Although 
technically a mathematician, Hill had little interest in modern developments of 
mathematics and worked almost entirely on the theory of planetary orbits. 

[Image: Courtesy of the American Mathematical Society] 


If vj and V 2 are nonzero vectors in p-' that have their initial points at the origin, then 
span (vi, V 2 ) , which consists of all linear combinations of vj and V 2 , is the plane through the 
origin determined by these two vectors. You should be able to visualize this from Figure 4.2.66 
by observing that the tip of the vector ^ivi + kyvj can be made to fall at any point in the 
plane by adjusting the scalars k\ and kt2 to lengthen, shorten, or reverse the directions of the 
vectors and ^2 V 2 appropriately. 



(a) Span |vJ is the line through the ( b ) Span [v,. v,) is the plane through the 

origin determined by v. origin determined by v, and v>. 







Figure 4.2.6 


EXAMPLE 13 A Spanning Set for P n 

The polynomials \,x,x^ . x n span the vector space P n defined in Example 10 since each 

polynomial p in P n can be written as 

p=tf0 + <zix+ • • - + <***” 

which is a linear combination of 1, x, x 2 , • • •, x n . We can denote this by writing 

P„ = span\\,x,x 2 , • • 


The next two examples are concerned with two important types of problems: 

Given a set S of vectors in R” and a vector v in R n , determine whether v is a linear combination of the 
vectors in S. 

Given a set S of vectors in R”, determine whether the vectors span R n . 

EXAMPLE 14 Linear Combinations 

Consider the vectors u = (1, 2, — 1) and v = ( 6 , 4, 2) in R*. Show that w= (9, 2, 7) is a 
linear combination of u and v and that w r = (4, — 1, 8 ) is not a linear combination of u and v. 

In order for w to be a linear combination of u and v, there must be scalars *i and * 2 
such that w = *iu + * 2 v; that is, 

(9,2,7)=*i(l,2, -l)+* 2 (6,4.2) 

or 

(9, 2,7) = + 6 * 2 , 2*i 4- 4* 2 , - *1 + 2* 2 ) 

Equating corresponding components gives 

*1 + 6* 2 = 9 
2*i+4 * 2 = 2 
—*1 + 2 * 2 = 7 

Solving this system using Gaussian elimination yields k\ = — 3, * 2 = 2, so 

w — — 3u + 2v 

Similarly, for w' to be a linear combination of u and v, there must be scalars k\ and * 2 such that 
v/ = *iu 4 - * 2 v; that is, 

(4, — 1, 8 ) =*i(l, 2, —/) 4-* 2 (6, 4, 2) 
or 

(4, -1,8) = (*1 4- 6* 2 , 2*i 4*4* 2 , -*1 4- 2* 2 ) 


Equating corresponding components gives 

*1 + 6*2 = 4 

2k\ + 4*2 = 

—*1 + 2*2 = 8 

This system of equations is inconsistent (verify), so no such scalars *i and *2 exist. 
Consequently, w' is not a linear combination of u and v. 


EXAMPLE 15 Testing for Spanning 


Determine whether vj = (1, 1, 2), V 2 = (1, 0, 1), and V 3 = (2, 1, 3) span the vector space p}. 


We must determine whether an arbitrary vector b = (b\, * 2 , £ 3 ) m R~' can be 
expressed as a linear combination 

b = *ivi + *2 v 2 + ^3 V 3 

of the vectors vi, V2, and V3. Expressing this equation in terms of components gives 
(b u b 2 , h) = *i(l, 1, 2) +* 2 (1. 0, 1) +* 3 (2,1. 3) 
or 

(bi,b 2 ,bj) = (*1 +*2 + 2 * 3 , *1 + *3, 2 *i +*2 + 3*3) 


*1 +*2 + 2*3 = b\ 

*1 + *3 = *2 

2*i+*2+ 3*3 = 63 


Thus, our problem reduces to ascertaining whether this system is consistent for all values of b\, 
b 2 , and b 2 . One way of doing this is to use parts (e) and (g) of Theorem 2.3.8, which state that 
the system is consistent if and only if its coefficient matrix 


A = 


1 1 
1 0 
2 1 


2 

1 

3 


has a nonzero determinant. But this is not the case here; we leave it for you to confirm that 
det(j4) = 0 , so vi, V 2 , and V 3 do not span p}. 


Solution Spaces of Homogeneous Systems 


The solutions of a homogeneous linear system Ax. = 0 of m equations in n unknowns can be viewed as vectors 
in R n . The following theorem provides a useful insight into the geometric structure of the solution set. 




THEOREM 4.2.4 


The solution set of a homogeneous linear system ^ = 0 m n unknowns is a sub space of/?”. 


Let W be the solution set for the system. The set W is not empty because it contains at least the trivial 
solution x = 0- 

To show that W is a subspace of/?”, w e must show that it is closed under addition and scalar multiplication. To 
do this, let xj and *2 be vectors in W. Since these vectors are solutions of = 0, we have 

Ax\ = 0 and Ax 2 = 0 

It follows from these equations and the distributive property of matrix multiplication that 

+(xi + X 2 ) = Ax\ 4 Ax 2 = 0 + 0 = 0 
so W is closed under addition. Similarly, if k is any scalar then 

j4(£xi) =£j4xi = .t0 = 0 
so W is also closed under scalar multiplication. 

Because the solution set of a homogeneous 
system in n unknowns is actually a subspace of 
R n , we will generally refer to it as the solution 
space of the system. 


EXAMPLE 16 Solution Spaces of Homogeneous Systems 


Consider the linear systems 


(a) 

1 -2 3' 

~x~ 


" 0 " 


2-4 6 

y 

— 

0 


3-6 9 

z 


0 


(b) 

1 

-2 

3' 

f:X~ 


'o' 


-3 

7 

-8 

y 

= 

0 


-2 

4 

-6 

z 


0 

(c) 

1 

-2 

3 

" x~ 


"0" 


-3 

7 

-8 

y 

= 

0 


4 

1 

2 

z 


0 


(d) 

o 

o 

o 

1_ 

” x~ 


"0" 


0 0 0 

y 

— 

0 


0 0 0 

z 


0 


Solution 

We leave it for you to verify that the solutions are 

x = 2s — 3t, y =s, z = t 


from which it follows that 


























x = 2y — 3z or x — 2y 4- 3z = 0 

This is the equation of a plane through the origin that has n = (1, —2, 3) as a normal. 

We leave it for you to verify that the solutions are 

x= -5 1, y = -t, z = t 

which are parametric equations for the line through the origin that is parallel to the vector 

v= (—5, -1,1). 

We leave it for you to verify that the only solution is x = 0, = 0, z = 0> so the solution 

space is { 0 }. 

This linear system is satisfied by all real values of x, y, and z, so the solution space is all of R-‘ 


Whereas the solution set of every homogeneous system of m equations in n unknowns is a subspace 
of R n , it is never true that the solution set of a nonhomogeneous system of m equations in n unknowns is a 
subspace of R n . There are two possible scenarios: first, the system may not have any solutions at all, and 
second, if there are solutions, then the solution set will not be closed under either addition or under scalar 
multiplication (Exercise 18). 


A Concluding Observation 

It is important to recognize that spanning sets are not unique. For example, any nonzero vector on the line in 
Figure 4.2.6a will span that line, and any two noncollinear vectors in the plane in Figure 4.2.66 will span that 
plane. The following theorem, whose proof we leave as an exercise, states conditions under which two sets of 
vectors will span the same space. 


THEOREM 4.2.5 

If S' = {vj, V 2 .v r ) and S' = (wj, w 2 , are nonempty sets of vectors in a vector space V, 

then 

span {v i , v 2 . v r ) = span <wi , w 2 ,..w*} 

if and only if each vector in S is a linear combination of those in S', and each vector in S' is a linear 
combination of those in S. 


Concept Review 

Subspace 


Zero subspace 
Examples of subspaces 
Linear combination 
Span 

Solution space 

Skills 

Determine whether a subset of a vector space is a subspace. 

Show that a subset of a vector space is a subspace. 

Show that a nonempty subset of a vector space is not a subspace by demonstrating that the set is 
either not closed under addition or not closed under scalar multiplication. 

Given a set S of vectors in R n and a vector v in R n , determine whether v is a linear combination of 
the vectors in S. 

Given a set S of vectors in R”, determine whether the vectors in S span R n . 

Determine whether two nonempty sets of vectors in a vector space V span the same subspace of V. 


Exercise Set 4.2 

1. Use Theorem 4.2.1 to determine which of the following are subspaces of R-'. 

(a) All vectors of the form (a, 0, 0). 

(b) All vectors of the form (a, 1, 1). 

(c) All vectors of the form (a, b, c), where b = a + c- 

(d) All vectors of the form ( a, b, c), where b = a + c 4 - 1 • 

(e) All vectors of the form ( a, b, 0). 

Answer: 

(a), (c), (e) 

2. Use Theorem 4.2.1 to determine which of the following are subspaces of M nn . 

(a) The set of all diagonal nxn matrices. 

(b) The set of all ^ x n matrices A such that det(-<4) = 0. 

(c) The set of all n x n matrices A such that tr(A) = 0. 

(d) The set of all symmetric nxn matrices. 

(e) The set of all ^ x n matrices A such that = — A- 

(f) The set of all n x n matrices A for which Ax. = 0 has only the trivial solution. 

(g) The set of all n x n matrices A such that AB = BA f° r some fixed nxn matrix B. 

3. Use Theorem 4.2.1 to determine which of the following are subspaces of P 3 . 

(a) All polynomials aQ + a{X 4 , a2X 2 + ay? for which a 0 = 0. 


(b) All polynomials aQ +a\x + ay? 4- ay? for which a 0 + a\ + <*2 + = 0- 

(c) All polynomials of the form a g | a ^ x \ a - >x 2 | a y?' * n wbich £ 0 , a \, a 2, and 3 3 arc integers. 

(d) All polynomials of the form + a\x, where «Q and a\ are real numbers. 

Answer: 

(a),(b), (d) 

4. Which of the following are subspaces of F (— 00 , 00 )? 

(a) All functions/in F( — 00 , 00 ) for which / (0) = 0. 

(b) All functions/in F( — 00 , 00 ) for which / (0) = 1. 

(c) All functions finF(— 00 , 00 ) for which/(— x) = / (x). 

(d) All polynomials of degree 2. 

5. Which of the following are subspaces of R ^ ? 

(a) All sequences v in R ' of the form v = (v, 0, v, 0, v, 0,...). 

(b) All sequences v in R ' of the form v = (v, 1, v, 1, v, 1,...). 

(c) All sequences v in R ' of the form v = (v, 2v, 4v, 8 v, 16v,...) . 

(d) All sequences in R ' whose components are 0 from some point on. 

Answer: 

(a), (c), (d) 

6. A line L through the origin in /?-' can be represented by parametric equations of the form x = at- y = bt, 
and z = ct -Use these equations to show that L is a subspace of// by showing that ifvj = (x\, y\,z\) and 
V 2 = (*2,72> z 2 ] are P°i nts on and k is any real number, then kv\ and vi + V 2 are also points on L. 

1. Which of the following are linear combinations of u = (0, — 2, 2) and v=(l,3, — 1)? 

(a) ( 2 , 2 , 2 ) 

(b) (3,1,5) 

(c) (0,4,5) 

(d) ( 0 , 0 , 0 ) 

Answer: 

(a), (b), (d) 

8. Express the following as linear combinations of u = (2, 1, 4), v = (1, —1,3), and w= (3, 2, 5). 

(a) (-9, -7, -15) 

(b) ( 6 , 11 , 6 ) 

(c) (0,0,0) 

(d) (7,8,9) 


9. Which of the following are linear combinations of 



(a) [ 6 -8' 

-1 —8_ 

(b) [o o' 

.o o. 

(c) [6 0 " 

_3 8_ 

(d) f-l 5" 

7 l . 

Answer: 

(a),(b), (c) 

10. In each part express the vector as a linear combination of p j = 2 | x | 4x 2 -> p 2 = 1 — x | and 
P3 = 3 + 2.x + 5x z - 

(a) —9 — 7x — \5x 2 

(b) 6 + 1 lx 4 = 6x A 

(c) 0 

(d) 7 + 8x + 9x 2 

11. In each part, determine whether the given vectors span R-‘. 

(a) V1 = (2, 2, 2), v 2 = (0, 0, 3), v 3 = (0. 1, 1) 

(b) vj = (2, — 1, 3), v 2 = (4,1, 2), v 3 = (8, -1.8) 

(c) vj = (3, 1,4), v 2 = (2, — 3, 5), v 3 = (5, -2. 9), v 4 = (1.4. -1) 

(d) vi = (1, 2, 6), v 2 = (3, 4, 1), v 3 = (4, 3, 1), v 4 = (3, 3, 1) 

Answer: 

(a) The vectors span 

(b) The vectors do not span 

(c) The vectors do not span 

(d) The vectors span 

12. Suppose that vj = (2, 1, 0, 3), v 2 = (3, — 1, 5, 2), and v 3 = ( — 1, 0, 2, 1). Which of the following 
vectors are in span (vi, v 2 , v 3 } ? 

(a) (2,3, -7,3) 

(b) ( 0 , 0 , 0 , 0 ) 

(c) ( 1 , 1 , 1 , 1 ) 

(d) (-4,6, -13,4) 



13. Determine whether the following polynomials span P 2 - 



Answer: 


PI = 1 -x + 2x 2 , P2 = 3 + x, 

P3 = 5 — x + Ax 2 , P4 = — 2 — 2x 4- 2x 2 


The polynomials do not span 

14. Let f = C os^x and g = sin"x. Which of the following lie in the space spanned by f and g? 

( a ) cos 2x 

(b) 3+x 2 

(c) 1 

(d) sinx 

(e) 0 

15. Determine whether the solution space of the system = 0 is a line through the origin, a plane through the 
origin, or the origin only. If it is a plane, find an equation for it. If it is a line, find parametric equations for 
it. 

(a) [-1 1 r 

A= 3-1 0 

2 -4 -5 

(b) 1 -2 3 

A= -3 6 9 

-2 4 -6 

(c) 12 3' 

A= 2 5 3 

1 0 8 

(d) [12 -6" 

A= 14 4 

3 10 6 

(e) fi-i r 

A= 2-1 4 

3 1 11 

(f) [1 -3 1] 

4 = 2 -6 2 

3-9 3 


Answer: 


(a) Line; x = - y = - -| t, z = t 

(b) Line; x = 2t, y =t, z = 0 

(c) Origin 

(d) Origin 



(e) Line; x — —3 1, y = —2 t, z — t 

(f) Plane; x — 3y 4- z = 0 

16. ( Calculus required) Show that the following sets of functions are subspaces of F( — oo, oo). 

(a) All continuous functions on ( — oo, oo). 

(b) All differentiable functions on ( — oo, oo). 

(c) All differentiable functions on ( — oo, oo) that satisfy f' 4 - 2f = 0. 

17. ( Calculus required) Show that the set of continuous functions f = / (A) on [a, b\ such that 

r4b =o 

is a subspace of C[a, h\. 

18. Show that the solution vectors of a consistent nonhomoge- neous system of m linear equations in n 
unknowns do not form a subspace of R n . 

19. Prove Theorem 4.2.5. 

20. Use Theorem 4.2.5 to show that the vectors vi = (1, 6 , 4), V 2 = (2, 4, — 1), V 3 = ( — 1, 2, 5), and the 
vectors wq = (1, — 2, — 5), W 2 = (0, 8 , 9) span the same subspace of r}. 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Every subspace of a vector space is itself a vector space. 

Answer: 

True 

(b) Every vector space is a subspace of itself. 

Answer: 

True 

(c) Every subset of a vector space V that contains the zero vector in V is a subspace of V. 

Answer: 

False 

(d) The set r} is a subspace of r}. 

Answer: 

False 

(e) The solution set of a consistent linear system Ax = b m equations in n unknowns is a subspace of/?”. 


Answer: 


False 

(f) The span of any finite set of vectors in a vector space is closed under addition and scalar multiplication. 
Answer: 

True 

(g) The intersection of any two subspaces of a vector space V is a subspace of V. 

Answer: 

True 

(h) The union of any two subspaces of a vector space V is a subspace of V. 

Answer: 

False 

(i) Two subsets of a vector space V that span the same subspace of V must be equal. 

Answer: 

False 

(j) The set of upper triangular nxn matrices is a subspace of the vector space of all ^ x n matrices. 
Answer: 

True 

0 s ) The polynomials % — 1 , (x — 1 and (x — 1 j"' span P 3 . 

Answer: 

False 
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4.3 Linear Independence 

In this section we will consider the question of whether the vectors in a given set are interrelated in the sense 
that one or more of them can be expressed as a linear combination of the others. This is important to know in 
applications because the existence of such relationships often signals that some kind of complication is likely 
to occur. 


Extraneous Vectors 


In a rectangular xy-coordinate system every vector in the plane can be expressed in exactly one way as a 
linear combination of the standard unit vectors. For example, the only way to express the vector (3, 2) as a 
linear combination of i = (1, 0) and j = (0, 1) is 

(3, 2) = 3(1, 0) 4- 2(0, 1) = 3i + 2j (1) 


(Figure 4.3.1). Suppose, however, that we were to introduce a third coordinate axis that makes an angle of 45 c 
with the x-axis. Call it the w-axis. As illustrated in Figure 4.3.2, the unit vector along the w-axis is 


w = 


,/F ft, 


Whereas Formula 1 shows the only way to express the vector (3, 2) as a linear combination of i and j, there 
are infinitely many ways to express this vector as a linear combination of i, j, and w. Three possibilities are 

13, 2 1 = 311, 01 + 210, 1 I F0[-L -U = 3i + 2j+0w 


3,2 =2 1,0 + 0,1 


+ Kw'V^) =3i+i+ ^ w 


3,2 =4 1.0 


+ 3 0,1 


~f2 


l_ J_ 

{&' 


= 4i + 3 j — \j~2w 


In short, by introducing a superfluous axis we created the complication of having multiple ways of assigning 
coordinates to points in the plane. What makes the vector w superfluous is the fact that it can be expressed as 
a linear combination of the vectors i and j, namely, 


w — 



= -M + 


/T & 


Thus, one of our main tasks in this section will be to develop ways of ascertaining whether one vector in a set 
S is a linear combination of other vectors in S. 









.(3. 2) 






-► 


Figure 4.3.1 



Linear Independence and Dependence 


We will often apply the terms linearly 
independent and linearly dependent to the 
vectors themselves rather than to the set. 


DEFINITION 1 

If S' = {vi, V 2 ,..., v r } is a nonempty set of vectors in a vector space V. then the vector equation 

*ivi + ^2 v 2 + -•- + k? v r = 0 

has at least one solution, namely, 

Ari = 0, A:2 = 0,£ r = 0 

We call this the trivial solution. If this is the only solution, then S is said to be a linearly independent 
set. If there are solutions in addition to the trivial solution, then S is said to be a linearly dependent 
set. 


J 


EXAMPLE 1 Linear Independence of the Standard Unit Vectors in R n 


The most basic linearly independent set in R M is the set of standard unit vectors 

ei = (1, 0, 0.0), e 2 = (0,1,0.0). e„ = (0, 0, 0.1) 







For notational simplicity, we will prove the linear independence in R-' of 

i= (1,0.0). j= (0,1,0), k= (0,0,1) 

The linear independence or linear dependence of these vectors is determined by whether there exist non 
solutions of the vector equation 


*li + *2J + *3k = 0 

Since the component form of this equation is 

(* 1 ,* 2 . * 3 ) = ( 0 , 0 , 0 ) 

it follows that *i = *2 = £3 = 0. This implies that 2 has only the trivial solution and hence that the vec 
linearly independent. 


EXAMPLE 2 Linear Independence in f? 3 

Determine whether the vectors 

VI = (1. -2,3), y 2 = (5,6. -1), v 3 =(3,2,1) 

are linearly independent or linearly dependent in 

The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 

*ivi + & 2 V 2 + kyv2 = 0 (3) 


or, equivalently, of 


*1(1. -2, 3)+*2(5. 6 . -1)+* 3 (3, 2,1) = (0,0,0) 


Equating corresponding components on the two sides yields the homogeneous linear system 

*1 + 5*2 + 3*3 = 0 

—2*i + 6*2 + 2*3 = 0 (4) 

3*i — *2 + *3 = 0 


Thus, our problem reduces to determining whether this system has nontrivial solutions. There 
are various ways to do this; one possibility is to simply solve the system, which yields 

*1 = - *2 = “ *3 = t 


(we omit the details). This shows that the system has nontrivial solutions and hence that the 
vectors are linearly dependent. A second method for obtaining the same result is to compute the 
determinant of the coefficient matrix 


A = 


1 5 3 

-2 6 2 

3 -1 1 


and use parts ( b ) and (g) of Theorem 2.3.8. We leave it for you to verify that det(2l) = 0, from 
which it follows 3 has nontrivial solutions and the vectors are linearly dependent. 




In Example 2, what relationship do you see 
between the components of V 2 , and V 3 and 
the columns of the coefficient matrix A? 


EXAMPLE 3 Linear Independence in f? 4 

Determine whether the vectors 

V! = (l,2,2, -1), v 2 = (4, 9, 9, —4), v 3 = (5, 8, 9, - 5) 
in are linearly dependent or linearly independent. 


The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 

*1 vj 4- * 2 v 2 + * 3 v 3 = 0 


or, equivalently, of 

*i(l,2. 2. -1) +* 2 (4,9, 9, —4) + * 3 (5, 8, 9, -5) = (0, 0, 0, 0) 


Equating corresponding components on the two sides yields the homogeneous linear system 

*1 + 4 * 2 + 5* 3 = 0 
2k\ + 9 * 2 + 8* 3 = 0 
2k\ + 9 * 2 + 9 * 3 = 0 
— *i — 4* 2 — 5* 3 = 0 

We leave it for you to show that this system has only the trivial solution 

*1 = 0 , * 2 = 0 , * 3 = 0 

from which you can conclude that vj, v 2 , and v 3 are linearly independent. 


EXAMPLE 4 An Important Linearly Independent Set in P n 

Show that the polynomials 

1 , x, x 2 ,..., x n 

form a linearly independent set in P n . 

For convenience, let us denote the polynomials as 

P0= L Pi=x, p 2 =x 2 ,..., p„ = x n 
We must show that the vector equation 

£?0P0 + ^1P1+«2P2+ • - • +a«PM = 0 


has only the trivial solution 


( 6 ) 


a Q = ai=a 2 = * • • =a n = 0 


But 5 is equivalent to the statement that 



for all x in ( — oo, oo), so we must show that this holds if and only if each coefficient in 6 is zero. 
To see that this is so, recall from algebra that a nonzero polynomial of degree n has at most n 
distinct roots. That being the case, each coefficient in 6 must be zero, for otherwise the left side of 
the equation would be a nonzero polynomial with infinitely many roots. Thus, 5 has only the 
trivial solution. 


The following example shows that the problem of determining whether a given set of vectors in P n is linearly 
independent or linearly dependent can be reduced to determining whether a certain set of vectors in R n is 
linearly dependent or independent. 

EXAMPLE 5 Linear Independence of Polynomials 

Determine whether the polynomials 


Pl = 1 — x, p 2 = 5 + 3x — 2x^, P 3 = 1 + 3x — x 2 


are linearly dependent or linearly independent in Pj. 


The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 


*lPl + *2P2 + ^3P3 = 0 


(7) 


This equation can be written as 



or, equivalently, as 


|*1 + 5*2 4- *3 J + ( — *i 4- 3*2 4- 3*3 Jx + ^ — 2*2 — *3 Jx 2 = 0 


Since this equation must be satisfied by all x in ( — 00 , 00 ), each coefficient must be zero (as 
explained in the previous example). Thus, the linear dependence or independence of the given 
polynomials hinges on whether the following linear system has a nontrivial solution: 


*1 + 5*2 + *3 = 0 

—*1 + 3*2 + 3*3 = 0 
—2*2 — *3 = 0 


(9) 


We leave it for you to show that this linear system has a nontrivial solutions either by solving it 
directly or by showing that the coefficient matrix has determinant zero. Thus, the set 
{p 1 , P 2 , P 3 ) is linearly dependent. 


In Example 5, what relationship do you see 
between the coefficients of the given 
polynomials and the column vectors of the 
coefficient matrix of system 9? 


An Alternative Interpretation of Linear Independence 

The terms linearly dependent and linearly independent are intended to indicate whether the vectors in a given 
set are interrelated in some way. The following theorem, whose proof is deferred to the end of this section, 
makes this idea more precise. 


THEOREM 4.3.1 

A set S with two or more vectors is 

(a) Linearly dependent if and only if at least one of the vectors in S is expressible as a linear 
combination of the other vectors in S. 

(b) Linearly independent if and only if no vector in S is expressible as a linear combination of the 
other vectors in S. 


EXAMPLE 6 Example 1 Revisited 


In Example 1 we showed that the standard unit vectors in R n are linearly independent. Thus, it 
follows from Theorem 4.3.1 that none of these vectors is expressible as a linear combination of 
the other two. To illustrate this in p, suppose, for example, that 

k = Aqi 4- &2J 


or in terms of components that 


( 0 , 0 , 1 ) = (* 1 ,* 2 . 0 ) 


Since this equation cannot be satisfied by any values of and ^ there is no way to express k 
as a linear combination of i and j. Similarly, i is not expressible as a linear combination of j and 
k, and j is not expressible as a linear combination of i and k. 


EXAMPLE 7 Example 2 Revisited 

In Example 2 we saw that the vectors 

vi = (1. -2.3). v 2 = (5, 6, — 1), v 3 =(3,2,1) 

are linearly dependent. Thus, it follows from Theorem 4.3.1 that at least one of these vectors is 


expressible as a linear combination of the other two. We leave it for you to confirm that these 
vectors satisfy the equation 

-1-vi + ^V2 - V3 = 0 
from which it follows, for example, that 

v 3 = ^vi + ^v 2 


Sets with One or Two Vectors 

The following basic theorem is concerned with the linear independence and linear dependence of sets with 
one or two vectors and sets that contain the zero vector. 


THEOREM 4.3.2 

(a) A finite set that contains 0 is linearly dependent. 

(b) A set with exactly one vector is linearly independent if and only if that vector is not 0. 

(c) A set with exactly two vectors is linearly independent if and only if neither vector is a scalar 
multiple of the other. 



Jozef Hoene de Wronski (1778-1853) 

The Polish-French mathematician Jozef Hoene de Wronski was bom Jozef Hoene 
and adopted the name Wronski after he married. Wronski’s life was fraught with controversy and 
conflict, which some say was due to his psychopathic tendencies and his exaggeration of the 
importance of his own work. Although Wronski's work was dismissed as mbbish for many years, and 
much of it was indeed erroneous, some of his ideas contained hidden brilliance and have survived. 
Among other things, Wronski designed a caterpillar vehicle to compete with trains (though it was 


never manufactured) and did research on the famous problem of determining the longitude of a ship at 
sea. His final years were spent in poverty. 

[Image: wikipedia] 


We will prove part (a) and leave the rest as exercises. 

For any vectors v\, V 2 ,v r , the set S= {vj, V 2 ,v r , 0) is linearly dependent since the 

equation 


Ovj + 0v2 + ■ ’ " + Ovy =H 1 (0) = 0 

expresses 0 as a linear combination of the vectors in S with coefficients that are not all zero. 

EXAMPLE 8 Linear Independence of Two Functions 

The functions f j = x and f 2 = sin x are linearly independent vectors in F( — 00 , 00 ) since 
neither function is a scalar multiple of the other. On the other hand, the two functions 
gl = sin 2x and g2 = sin x cos x are linearly dependent because the trigonometric identity 
sin 2x = 2 sin x cos x reveals that gl and g2 are scalar multiples of each other. 


A Geometric Interpretation of Linear Independence 

Linear independence has the following useful geometric interpretations in g 2 and 

Two vectors in g} or g-' are linearly independent if and only if they do not lie on the same line when they 
have their initial points at the origin. Otherwise one would be a scalar multiple of the other (Figure 4.3.3). 



Figure 4.3.3 

Three vectors in are linearly independent if and only if they do not lie in the same plane when they have 
their initial points at the origin. Otherwise at least one would be a linear combination of the other two 
(Figure 4.3.4). 
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(a) Linearly dependent (6) Linearly dependent (c) Linearly independent 

Figure 4.3.4 


At the beginning of this section we observed that a third coordinate axis in R z is superfluous by showing that 
a unit vector along such an axis would have to be expressible as a linear combination of unit vectors along the 
positive x- and y-axis. That result is a consequence of the next theorem, which shows that there can be at most 
n vectors in any linearly independent set R M . 

It follows from Theorem 4.3.3, for example, 
that a set in r} with more than two vectors is 
linearly dependent and a set in r} with more 
than three vectors is linearly dependent. 


THEOREM 4.3.3 

Let S= {vi, V2..... v r ) be a set of vectors in R n . If r > «, then S is linearly dependent. 


Suppose that 


VI = 

( v ll> v 12> • 

• AVi„) 

v 2 = 

( v 21> v 22> • 

' ’ , v 2m) 

v, = 

(vq,Vr2> • 

• • . v,„) 


and consider the equation 

/tjvi + &2 v 2 + ‘ ' • + k r v r — 0 

If we express both sides of this equation in terms of components and then equate the corresponding 
components, we obtain the system 










Vll*l +V21*2+ • • • +v r i^r = 0 

Vi2*l+V22*2+ ‘ ‘ ‘ + v r2^r = 0 

viM*l +v 2m ^2+ • • • + v m k r = 0 

This is a homogeneous system of n equations in the r unknowns fcj,.... k r . Since r > n, it follows from 
Theorem 1.2.2 that the system has nontrivial solutions. Therefore, S = {vi, v 2 ,..., v r ) is a linearly 
dependent set. 

CALCULUS REQUIRED 

Linear Independence of Functions 

Sometimes linear dependence of functions can be deduced from known identities. For example, the functions 

f 1 = sin x, f 2 = cos x, and f3 = 5 
form a linearly dependent set in F ( — 00 , 00 ) , since the equation 

5F1 -H 5f 2 — f 3 = 5suAr + 5cos^r — 5 

= 5 ^sin^x + cos^x J — 5 = 0 

expresses 0 as a linear combination of f j, f 2 , and f 3 with coefficients that are not all zero. 

Unfortunately, there is no general method that can be used to determine whether a set of functions is linearly 
independent or linearly dependent. However, there does exist a theorem that is useful for establishing linear 
independence in certain circumstances. The following definition will be useful for discussing that theorem. 


DEFINITION 2 


If f j = / j (x), f 2 = / 20 O> . = f n( x ) are functions that are >1 — \ times differentiable on the 

interval ( — 00 , 00 ), then the determinant 



is called the Wronskian of / 1 , / 2 , / n 


Suppose for the moment that f \ = / 1 (x) , f 2 = / 2 (x) .f M „ (x) are linearly dependent vectors in 

C 1 - ‘ 1J j — 00, 00J. This implies that for certain values of the coefficients the vector equation 

fcifl +^2^2+ ’ ’ ’ +A:«fM = 0 
has a nontrivial solution, or equivalently that the equation 




*l/lOO+*2/2(*) + • • • + *m/m(*) = 0 

is satisfied for all x in ( — oo, oo). Using this equation together with those that result by differentiating it 
n — 1 times yields the linear system 

*l/lOO +* 2/200 + • • • +k n f„(x) =0 

*l/[ (*) +*2/2( x ) + • • • + *„/£(*) =0 



Thus, the linear dependence of f f 2 , f M implies that the linear system 


/ 1O) 
/!« 

/2O) 

• • • /«w 

• • • /»(*) 

*1 

*2 


'o' 

0 

/r(* 

to 

1 

•—* 

• • • 



0 


has a nontrivial solution. But this implies that the determinant of the coefficient matrix of 10 is zero for every 
suchx. Since this determinant is the Wronskian of / 1 , f 2> f w we have established the following result. 


THEOREM 4.3.4 

If the functions f 1, f 2 , f n have n—\ continuous derivatives on the interval ( — 00 , 00 ), and if the 
Wronskian of these functions is not identically zero on ( — 00 , 00 ), then these functions form a 

linearly independent set of vectors in k 11 j — 00 , oo }. 


In Example 8 we showed that x and sin x are linearly independent functions by observing that neither is a 
scalar multiple of the other. The following example shows how to obtain the same result using the Wronskian 
(though it is a more complicated procedure in this particular case). 

EXAMPLE 9 Linear Independence Using the Wronskian 


Use the Wronskian to show that f 1 = x and f 2 = sin x are linearly independent. 


The Wronskian is 


W 


x 

1 


sinx 
cos x 


= x cos x — sin x 


This function is not identically zero on the interval ( — 00 , 00 ) since, for example, 



Thus, the functions are linearly independent. 










WARNING 


The converse of Theorem 4.3.4 is false. If the 
Wronskian of f f 2 ,..., is identically zero 

on ( — 00, 00) , then no conclusion can be 
reached about the linear independence of 

<fl.f 2 . f„> — this set of vectors may be 

linearly independent or linearly dependent. 

EXAMPLE 10 Linear Independence Using the Wronskian 

Use the Wronskian to show that f j = 1, f = e x , and f - ; = e ^ x are linearly independent. 

The Wronskian is 


1 ** <? 2 * 
W{x)= 0 e * 2e 2x 
0 e* 4s 2 * 


This function is obviously not identically zero on ( — 00, 00) , so f j , f 3 , and f 3 form a linearly 
independent set. 

OPTIONAL 

We will close this section by proving part (a) of Theorem 4.3.1. We will leave the proof of part ( b ) as an 
exercise. 

of Theorem 4.3.1 (a) Let S = {vj, V 2 , v r ) be a set with two or more vectors. If we assume 
that S is linearly dependent, then there are scalars k\, k 2 , ■■■, k r , not all zero, such that 


* 1 Y 1 + &2 v 2 + ‘ ' • + k r v r = 0 


To be specific, suppose that 0. Then 11 can be rewritten as 



which expresses vj as a linear combination of the other vectors in S. Similarly, if kj 0 in 11 for some 
j = 2,3,r, then v ; is expressible as a linear combination of the other vectors in S. 

Conversely, let us assume that at least one of the vectors in S is expressible as a linear combination of the 
other vectors. To be specific, suppose that 


V 1 =C 2 V 2 + C 3 V 3 + * • • +c r v r 




V 1 — C 2 V 2 ~ C 3 V 3 — • • • — c r v r = 0 
It follows that S is linearly dependent since the equation 

ijvi + &2 v 2 + * - * + *r v r = 0 


is satisfied by 


*1 = 1 , k 2 = -c 2 . k r =-c r 

which are not all zero. The proof in the case where some vector other than vi is expressible as a linear 
combination of the other vectors in S is similar. 


Concept Review 

Trivial solution 
Linearly independent set 
Linearly dependent set 
Wronskian 

Skills 

Determine whether a set of vectors is linearly independent or linearly dependent. 

Express one vector in a linearly dependent set as a linear combination of the other vectors in the set. 
Use the Wronskian to show that a set of functions is linearly independent. 


Exercise Set 4.3 


1. Explain why the following are linearly dependent sets of vectors. (Solve this problem by inspection.) 

(a) ui = ( —1, 2,4) and u 2 = (5, - 10, -20) in /? 3 

(b) ui = (3, - l),u 2 = (4, 5),u 3 = (-4,7) in /? 2 

(c) P1 = 3 - 2x + x 2 and P2 = 6 - 4x + 2x 2 in P 2 




-3 4 
2 0 


and B = 


3 -4 

-2 0 


in M 22 


Answer: 


(a) u 2 is a scalar multiple of uj . 

(b) The vectors are linearly dependent by Theorem 4.3.3. 

(c) P 2 is a scalar multiple of p l. 

(d) B is a scalar multiple of 4. 

2. Which of the following sets of vectors in /?-' are linearly dependent? 

(a) (4, -1,2), (-4,10,2) 






(b) (-3,0,4), (5, -1,2), (1,1,3) 

(c) ( 8 , -1,3), (4,0,1) 

(d) (-2,0,1), (3,2,5), ( 6 , -1,1), (7,0, -2) 

3. Which of the following sets of vectors in are linearly dependent? 

(a) (3, 8 ,7, - 3), (1, 5, 3, - 1), (2, - 1, 2, 6 ), (1,4, 0, 3) 

(b) (0,0, 2, 2), (3, 3, 0,0), (1,1,0, -1) 

(c) (0,3, -3, - 6 ), (-2, 0,0, — 6 ), (0, -4, -2, -2),(0, -8,4, -4) 

(d) (3, 0, - 3, 6 ), (0, 2, 3, 1), (0, - 2, - 2, 0), (- 2, 1, 2, 1) 

Answer: 

None 

4. Which of the following sets of vectors in P 2 are linearly dependent? 

(a) 2-x + 4x 2 , 3 + 6 ;r + 2;r 2 , 2+10;r-4;r 2 

(b) 3 + x + x 2 ,2-x + 5 x 2 , 4 - 3x 2 

(c) 6 -x 2 

(d) 1 4 - 3x + 3x 2 , x + 4x A , 5 4 - 6 x 4 - 3x z , 7 + 2x — x A 

5. Assume that vi, v 2 , and V 3 are vectors in R-‘ that have their initial points at the origin. In each part, 
determine whether the three vectors lie in a plane. 

(a) v 1 = (2, — 2, 0), v 2 = ( 6 ,1,4), v 3 = (2, 0, -4) 

(b) vj = (— 6 ,7, 2), v 2 = (3, 2,4), v 3 = (4, -1,2) 

Answer: 

(a) They do not lie in a plane. 

(b) They do lie in a plane. 

6 . Assume that vj, v 2 , and v 3 are vectors in p-‘ that have their initial points at the origin. In each part, 
determine whether the three vectors lie on the same line. 

(a) vi = (-1, 2, 3),v 2 = (2, -4, - 6 ), v 3 = ( - 3, 6 , 0) 

(b) vi = (2, - 1,4), v 2 = (4, 2, 3), v 3 = (2,7, - 6 ) 

(c) vi = (4, 6 , 8 ), v 2 = (2, 3,4), v 3 = (-2, -3, -4) 

'• (a) Show that the three vectors vj = (0, 3, 1, — 1), v 2 = ( 6 , 0, 5, 1), and v 3 = (4, — 7, 1, 3) form a 
linearly dependent set in £ 4 . 

(b) Express each vector in part (a) as a linear combination of the other two. 

Answer: 

o ^ n 'z n o 

(b) vi = ^v 2 - ^v 3 , v 2 = -£vi + |v 3 , v 3 = - jV! + jv 2 



(a) Show that the three vectors vi = (1, 2, 3, 4), v 2 = (0, 1, 0, — 1), and V 3 = (1, 3, 3, 3) form a 
linearly dependent set in £ 4 . 

(b) Express each vector in part (a) as a linear combination of the other two. 

9. For which real values of ,\ do the following vectors form a linearly dependent set in 


vi = 


A, 







Answer: 


A=-f A=1 

10. Show that if {vj, V 2 , V 3 } is a linearly independent set of vectors, then so are 
{vi, V 2 ) , {▼!, V 3 } , {v 2 , V 3 } , (vi) , {v 2 } ,and {v 3 } . 

11. Show that if S= (vi, v 2 ,v r ) is a linearly independent set of vectors, then so is every nonempty 
subset of S. 


12. Show that if S' = { vj, v 2 , V 3} is a linearly dependent set of vectors in a vector space V, and V 4 is any 
vector in V that is not in S, then (vi, v 2 , V 3 , V 4 } is also linearly dependent. 

13. Show that if S' = { vi, v 2 ,..v r } is a linearly dependent set of vectors in a vector space V, and if 

..., v„ are any vectors in V that are not in S, then {vi, v 2 ,.... v r , v r _|_i,.... v„} is also linearly 
dependent. 

14. Show that in P 2 every set with more than three vectors is linearly dependent. 

15. Show that if {vj, v 2 } is linearly independent and V 3 does not lie in span {vj, v 2 } , then (vi, v 2 , V 3 } is 
linearly independent. 

16. Prove: For any vectors u, v, and w in a vector space V, the vectors u — v v — w and w — u form a 
linearly dependent set. 

17. Prove: The space spanned by two vectors in is a line through the origin, a plane through the origin, or 
the origin itself. 

18. Under what conditions is a set with one vector linearly independent? 

19. Are the vectors Vi, v 2 , and V 3 in part (a) of the accompanying figure linearly independent? What about 
those in part ( b )? Explain. 




Figure Ex-19 














Answer: 


(a) They are linearly independent since vj, V 2 , and V 3 do not lie in the same plane when they are placed 
with their initial points at the origin. 

(b) They are not linearly independent since vj, V 2 , and V 3 line in the same plane when they are placed 
with their initial points at the origin. 

20. By using appropriate identities, where required, determine which of the following sets of vectors in 
F ( — 00, 00) are linearly dependent. 

(a) 6 , 3 sin z x, 2 cos 2 x 

(b) x, cos x 

( c ) 1 , sin x, sin 2x 

(d) cos 2x, sin x, cos x 

(e) (3-x) 2 , x 2 -6x, 5 

(f) 0, COS 5TX, SUV 3xx 

21. The functions / 1 (x) = x and f 2 (x) = cos x are linearly independent in F(— 00, 00) because neither 
function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 

Answer: 

l¥(x) = — x sin x — cos x * 0 for some x. 

22 . The functions /1 (x) = sin x and / 2 OO = cos x are linearly independent in F( — oo, 00) because 
neither function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 

23. (Calculus required) Use the Wronskian to show that the following sets of vectors are linearly 
independent. 

(a) 1 , x, e x 

(b) 1 , x, x 2 

Answer: 

(a) w(x) = e x * 0 

(b ) ^(*) = 2*0 

24. Show that the functions /1 (x'j 

25. Show that the functions f \{x) 

Answer: 

^(x)= 2 sinx *0 for some x. 

26. Use part (a) of Theorem 4.3.1 to prove part ( b ). 


= e * , / 2 (x ) = xe *, and / 3 fx j = x 2 e x are linearly independent. 

= sin x, f 2(x) = cos x, and / 3(x) = x cos x are linearly independent. 



27. Prove part ( b ) of Theorem 4.3.2. 

28* (a) In Example 1 we showed that the mutually orthogonal vectors i, j, and k form a linearly independent 
set of vectors in p^. Do you think that every set of three nonzero mutually orthogonal vectors in f;~‘ is 
linearly independent? Justify your conclusion with a geometric argument. 

(b) Justify your conclusion with an algebraic argument. [Hint: Use dot products.] 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) A set containing a single vector is linearly independent. 

Answer: 

False 

(b) The set of vectors { v, £v) is linearly dependent for every scalar k. 

Answer: 

True 

(c) Every linearly dependent set contains the zero vector. 

Answer: 

False 

(d) If the set of vectors {vj, V2, V3} is linearly independent, then (Avi, kvj, ^3} is also linearly 
independent for every nonzero scalar k. 

Answer: 

True 

(e) If vj,.... v„ are linearly dependent nonzero vectors, then at least one vector is a unique linear 
combination of vi, .... v^_j 

Answer: 

True 

(f) The set of 2 x 2 matrices that contain exactly two l's and two 0's is a linearly independent set in M22- 
Answer: 

False 

(g) The three polynomials (A — 1) (A + 2), x(A + 2), and x(x — 1) are linearly independent. 

Answer: 


True 


(h) The functions f [ and / 2 are linearly dependent if there is a real number x so that 
k\f 1 (x) + k'lf 2OO = 0 for some scalars and k2- 

Answer: 

False 
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4.4 Coordinates and Basis 

We usually think of a line as being one-dimensional, a plane as two-dimensional, and the space around us as three- 
dimensional. It is the primary goal of this section and the next to make this intuitive notion of dimension precise. 
In this section we will discuss coordinate systems in general vector spaces and lay the groundwork for a precise 
definition of dimension in the next section. 


Coordinate Systems in Linear Algebra 

In analytic geometry we learned to use rectangular coordinate systems to create a one-to-one correspondence 
between points in 2-space and ordered pairs of real numbers and between points in 3-space and ordered triples of 
real numbers (Figure 4.4.1). Although rectangular coordinate systems are common, they are not essential. For 
example, Figure 4.4.2 shows coordinate systems in 2-space and 3-space in which the coordinate axes are not 
mutually perpendicular. 



Coordinates of P in a rectangular 
coordinate system in 2-space. 



Figure 4.4.1 



In linear algebra coordinate systems are commonly specified using vectors rather than coordinate axes. For 
example, in Figure 4.4.3 we have recreated the coordinate systems in Figure 4.4.2 by using unit vectors to identify 
the positive directions and then attaching coordinates to a point P using the scalar coefficients in the equations 

OP = au\ 4 bu 2 and OP = a\i\ 4 b\i 2 4 CU 3 



























P{a.b) 



P{a, b , c\ 



Figure 4.4.3 


Units of measurement are essential ingredients of any coordinate system. In geometry problems one tries to use 
the same unit of measurement on all axes to avoid distorting the shapes of figures. This is less important in 
applications where coordinates represent physical quantities with diverse units (for example, time in seconds on 
one axis and temperature in degrees Celsius on another axis). To allow for this level of generality, we will relax 
the requirement that unit vectors be used to identify the positive directions and require only that those vectors be 
linearly independent. We will refer to these as the “basis vectors” for the coordinate system, hi summary, it is the 
directions of the basis vectors that establish the positive directions, and it is the lengths of the basis vectors that 
establish the spacing between the integer points on the axes (Figure 4.4.4). 



Equal spacing 


Unequal spacing 


Equal spacing 


Unequal sp 

Perpendicular axes 


Perpendicular axes 


Skew axes 


Skew axes 


Figure 4.4.4 


Basis for a Vector Space 

The following definition will make the preceding ideas more precise and will enable us to extend the concept of a 
coordinate system to general vector spaces. 

Note that in Definition 1 we have required a basis 
to have finitely many vectors. Some authors call 
this a finite basis , but we will not use this 
terminology. 


1 























DEFINITION 1 


If V is any vector space and S = { vj, V2,.. v n } is a finite set of vectors in V, then S is called a basis for 
V if the following two conditions hold: 

(a) S is linearly independent. 

(b) S spans V. 


If you think of a basis as describing a coordinate system for a vector space in V, then part (a) of this definition 
guarantees that there is no interrelationship between the basis vectors, and part (b) guarantees that there are 
enough basis vectors to provide coordinates for all vectors in V. Here are some examples. 

EXAMPLE 1 The Standard Basis for R n 


Recall from Example 11 of Section 4.2 that the standard unit vectors 

©1 = (1, 0, 0,0), e 2 = (0, 1, 0,0),e M = (0, 0, 0,1) 

span R n and from Example 1 of Section 4.3 that they are linearly independent. Thus, they form a 
basis for R n that we call the standard basis for R n . In particular, 

i= (1,0,0), j= (0,1,0), k= (0,0,1) 
is the standard basis for R-'. 


EXAMPLE 2 The Standard Basis for P n 

Show that S' = |l, is a basis for the vector space P n of polynomials of degree n or 

less. 

We must show that the polynomials in S are linearly independent and span P n . Let us 
denote these polynomials by 

P0 = 1> Pi =*> P2 = x 2 . P» = x” 

We showed in Example 13 of Section 4.2 that these vectors span P n and in Example 4 of Section 
4.3 that they are linearly independent. Thus, they form a basis for P n that we call the standard basis 

f° r ?n 


EXAMPLE 3 Another Basis for R 3 

Show that the vectors vi = (1, 2, 1), V 2 = (2, 9, 0), and V 3 = (3, 3, 4) form a basis lor p^. 

We must show that these vectors are linearly independent and span f,'-'. To prove linear 
independence we must show that the vector equation 


c ivi +c 2V2 + c 3V3 = 0 


(1) 


has only the trivial solution; and to prove that the vectors span R* we must show that every vector 
b = (b\, b% 63 ) in can be expressed as 


Cl vi +C2V2 + C3V3 = b 


( 2 ) 


By equating corresponding components on the two sides, these two equations can be expressed as 
the linear systems 


c 1 + 2^2 + 3c2 = 0 ci + 2c2 + 3 c 3 = b\ 

2c 1 + 9 c 2 4- 3 c 3 = 0 and 2c 1 + 9 c 2 + 3 c 3 = 62 (3) 

ci +4 c3 = 0 ci +4c3 = &3 


(verify). Thus, we have reduced the problem to showing that in 3 the homogeneous system has only 
the trivial solution and that the nonhomogeneous system is consistent for all values of b\, & 2 > an d ^3 
. But the two systems have the same coefficient matrix 


A = 


1 

2 

1 


2 3 
9 3 
0 4 


so it follows from parts ( b ), (e), and (g) of Theorem 2.3.8 that we can prove both results at the same 
time by showing that det(^4) * 0. We leave it for you to confirm that det(-d) = — 1, which proves 
that the vectors vi, V 2 , and V 3 form a basis for r}. 


EXAMPLE 4 The Standard Basis for M mn 

Show that the matrices 


r 1 01 


'o r 


c 

c 


fn o~l 

1 — 

0 • 

0 < 

1_ 

* 

to 

II 

_° 0_ 

, M3 = 

1 

0 < 

1 _ 

II 

> 0 

_1 


form a basis for the vector space M 22 °f 2 x 2 matrices. 


We must show that the matrices are linearly independent and span M 22 * To prove linear 
independence we must show that the equation 


c\M\ +C2M2 +C3M3 + c^M 4 = 0 


( 4 ) 


has only the trivial solution, where 0 is the 2 x 2 zero matrix; and to prove that the matrices span 
M 22 we must show that every 2 x 2 matrix 


can be expressed as 


ci Mi +C2M2 + C3M3 + cqM4 = 5 


( 5 ) 














The matrix forms of Equations 4 and 5 are 



1 0 


'0 r 


'0 o' 


'0 o' 


'0 o' 

Cl 

0 0 

+ C 2 

0 0 

+ C 3 

1 0 

+ C 4 

0 1 


0 0 


1 0 


0 1 


0 0 


0 0 


a b 

Cl 

0 0 

+ c 2 

0 0 

+ C2 

1 0 

+ C4 

0 1 


c d 


which can be rewritten as 


a b 
c d 

Since the first equation has only the trivial solution 

c\ = C 2 = C 2 = C 4 = 0 


ci c 2 
c 3 C4 


0 0 
0 0 


and 


ci c2 
c 3 c 4 


the matrices are linearly independent, and since the second equation has the solution 

c\=a, C 2 = b, C 3 = c, 04 = d 


the matrices span M 22 - This proves that the matrices M 2 , M 3 , M 4 form a basis for M 22 - 

More generally, the mn different matrices whose entries are zero except for a single entry of 1 form 
a basis for M mn called the standard basis for M mn . 


Some writers define the empty set to be a basis 
for the zero vector space, but we will not do so. 


It is not true that every vector space has a basis in the sense of Definition 1. The simplest example is the zero 
vector space, which contains no linearly independent sets and hence no basis. The following is an example of a 
nonzero vector space that has no basis in the sense of Definition 1 because it cannot be spanned by finitely many 
vectors. 

EXAMPLE 5 A Vector Space That Has No Finite Spanning Set 

Show that the vector space of P x of all polynomials with real coefficients has no finite spanning set. 

If there were a finite spanning set, say S= {p 1 , P 2 > - • P r) •> then the degrees of the 
polynomials in S would have a maximum value, say n\ and this in turn would imply that any linear 
combination of the polynomials in S would have degree at most n. Thus, there would be no way to 
express the polynomial *”+1 as a linear combination of the polynomials in S , contradicting the fact that 
the vectors in S span P x . 


For reasons that will become clear shortly, a vector space that cannot be spanned by finitely many vectors is said 
to be infinite-dimensional , whereas those that can are said to be finite-dimensional. 


EXAMPLE 6 Some Finite-and Infinite-Dimensional Spaces 






























In Example 1, Example 2, and Example 4 we found bases for R ”, P n , and M mn , so these vector 
spaces are finite-dimensional. We showed in Example 5 that the vector space P^ is not spanned by 
finitely many vectors and hence is infinite-dimensional. In the exercises of this section and the next 
we will ask you to show that the vector spaces R F ( — oo, oo), C( — oo, oo), C m (-00,00), and 

C ^ ( — oo, oo) are infinite-dimensional. 


Coordinates Relative to a Basis 

Earlier in this section we drew an informal analogy between basis vectors and coordinate systems. Our next goal is 
to make this informal idea precise by defining the notion of a coordinate system in a general vector space. The 
following theorem will be our first step in that direction. 


Uniqueness of Basis Representation 

If S = {vi, V 2 ,..v„) is a basis for a vector space V, then every vector v in V can be expressed in the 
form v = 4- C2 V 2 + • ■ ■ 4- c n \ n in exactly one way. 


Since S spans V, it follows from the definition of a spanning set that every vector in V is expressible as a 
linear combination of the vectors in S. To see that there is only one way to express a vector as a linear combination 
of the vectors in S , suppose that some vector v can be written as 


and also as 


v = civi +C2V2+ * ’ ’ ^ c n v n 


v = £ivi 4- &2 v 2 + ’ ’ ' +kn v n 
Subtracting the second equation from the first gives 

0 = (ci— *l)vi+ (c 2 —*2)*2+ ■ ■ • 

Since the right side of this equation is a linear combination of vectors in S, the linear independence of S implies 
that 


ci-*1 = 0. C2-k2 = 0,..., C n -k n = 0 


that is. 


^1 ^ 1 > C2 ^2 > » C Yl kyi 

Thus, the two expressions for v are the same. 



Figure 4.4.5 


Sometimes it will be desirable to write a 
coordinate vector as a column matrix, in which 
case we will denote it using square brackets as 





c n 


We will refer to [ v] £ as a coordinate matrix and 
reserve the terminology coordinate vector for the 
comma delimited form (v) £. 


We now have all of the ingredients required to define the notion of “coordinates” in a general vector space V. For 
motivation, observe that in for example, the coordinates (a, b, c ) of a vector v are precisely the coefficients in 
the formula 

v = ai + b j + ck 

that expresses v as a linear combination of the standard basis vectors for R 1 ' (see Figure 4.4.5). The following 
definition generalizes this idea. 


DEFINITION 2 

If S = {vi, v 2 , - -v„ ) is a basis for a vector space V, and 

v = civi+c 2 v 2 + • • • +c n v n 

is the expression for a vector v in terms of the basis S, then the scalars c \ , c 2 ,.. c n are called the 
coordinates of v relative to the basis S. The vector (c c 2 , c n ) in R n constructed from these 
coordinates is called the coordinate vector of v relative to S ; it is denoted by 

(y)s=(.ci,C2 . C n ) (6) 


J 


Recall that two sets are considered to be the same if they have the same members, even if those 










members are written in a different order. However, if S = {vj, V 2 , - -} is a set of basis vectors , then changing 
the order in which the vectors are written would change the order of the entries in (v) £, possibly producing a 
different coordinate vector. To avoid this complication, we will make the convention that in any discussion 
involving a basis S the order of the vectors in S remains fixed. Some authors call a set of basis vectors with this 
restriction an ordered basis. However, we will use this terminology only when emphasis on the order is required 
for clarity. 


Observe that (v)^ is a vector in R n , so that once basis S is given for a vector space V, Theorem 4.4.1 establishes a 
one-to-one correspondence between vectors in V and vectors in R n (Figure 4.4.6). 

A one-to-one correspondence 


V (V) s 

V R n 

Figure 4.4.6 


EXAMPLE 7 Coordinates Relative to the Standard Basis for R n 

In the special case where V = R n and S is the standard basis , the coordinate vector (v) £ and the vector 
v are the same; that is, 

V= (v) s 

For example, in the representation of a vector v = (a, b, c ) as a linear combination of the vectors in 
the standard basis S = is 

v = ai 4 - b j + ck 

so the coordinate vector relative to this basis is (v)^= (a, b, c), which is the same as the vector v. 


EXAMPLE 8 Coordinate Vectors Relative to Standard Bases 

Find the coordinate vector for the polynomial 

= co + ci* +C2X 2 + • • • 

relative to the standard basis for the vector space P n . 

Find the coordinate vector of 

^ a b 

l *. 

relative to the standard basis for M 22 - 

Solution 

The given formula for p(x) expresses this polynomial as a linear combination of the standard 
basis vectors S= x n i Thus, the coordinate vector for p relative to S is 




(p)s-= i.CQ,c\,C2,...,c n ) 

We showed in Example 4 that the representation of a vector 


B = 


a b 
c d 


as a linear combination of the standard basis vectors is 


B = 


a b 
c d 


= a 


1 0 
0 0 


+ b 


0 1 
0 0 


+ c 


0 0 
1 0 


0 0 
0 1 


so the coordinate vector of B relative to S is 

(B) s ={a,b,c,d) 


EXAMPLE 9 Coordinates in R 3 

We showed in Example 3 that the vectors 

vi = (1,2,1), v 2 =(2,9,0), v 3 = (3, 3,4) 
form a basis for R-'. Find the coordinate vector ofv=(5, — 1,9) relative to the basis 
s= {vi, V 2 , v 3 ) . 

Find the vector v in whose coordinate vector relative to S is (v) g = ( — 1, 3, 2). 

Solution 

To find (v) £ we must first express v as a linear combination of the vectors in S ; that is, we must 
find values of c i, c 2 , and c 3 such that 

v = civi + c 2 v 2 + c 3 v 3 

or, in terms of components, 

(5, - 1, 9) = C1 (1, 2, 1) +c 2 (2, 9, 0) +c 3 (3, 3,4) 

Equating corresponding components gives 

c\ + 2c2 + 3c2 = 5 

2 c \ + 9c 2 + 3c 3 = —1 

c\ + 4 c 3 = 9 

Solving this system we obtain c\ = 1, c 2 = — 1, c 3 = 2 (verify). Therefore, 

(v)^= (1. - 1. 2) 

Using the definition of (v) 5 , we obtain 

v = ( — l)vi + 3 v 2 + 2 v 3 

= ( - 1)(1, 2, 1) + 3(2, 9, 0) + 2(3, 3, 4) = (11, 31, 7) 













Concept Review 

Basis 

Standard bases for R n ,P n , M mn 
F inite-dimensional 
Infinite-dimensional 
Coordinates 
Coordinate vector 

Skills 

Show that a set of vectors is a basis for a vector space. 
Find the coordinates of a vector relative to a basis. 

Find the coordinate vector of a vector relative to a basis. 


Exercise Set 4.4 


1. In words, explain why the following sets of vectors are not bases for the indicated vector spaces. 

(a) U! = (1, 2), ii 2 = (0. 3 ),U 3 = (2,7) for /? 2 

(b) ui = ( — 1, 3, 2 ),U 2 = ( 6 , 1, 1) for /? 3 


( c ) p i = 1 + x 4- X > P2 — x ~ 1 f° r 
®A = 


1 1 
2 3 


B = 


6 0 

-1 4 


C = 


3 0 
1 7 


,D = 


5 1 
4 2 


E = 


1 1 
2 9 


, for M 2 2 


Answer: 

(a) A basis for /? 2 has two linearly independent vectors. 

(b) A basis for f’~' has three linearly independent vectors. 

(c) A basis for P 2 has three linearly independent vectors. 

(d) A basis for M 22 has four linearly independent vectors. 

2. Which of the following sets of vectors are bases for /? 2 ? 

(a) ((2,1), (3,0)) 

(b) ((4,1), (-7, -8)} 

(c) ((0,0), (1,3)) 

(d) ((3. 9), (-4, -12)} 

3. Which of the following sets of vectors are bases for /?-'? 

(a) ((1,0,0), (2, 2,0), (3, 3, 3)} 

(b ) ((3,1, -4), (2, 5, 6), (1,4, 8)} 

(c) ((2. — 3, 1), (4, 1, 1), (0, -7,1)} 












(d) {(1,6,4), (2,4, -1), (-1,2,5)} 


Answer: 

(a),(b) 

4 . Which of the following form bases for P 2 I 

(a) \-3x + 2x 2 , \+x + 4x 2 , 1 -lx 

(b) 4 + 6x +x 2 , — 1 + 4;r 4 - 2x 2 , 5 + 2x-x 2 

(c) l+x + x 2 , x + x 2 , x 2 

(d) -4 + x + 3x 2 , 6-F 5x + 2x 2 , S + 4x + x 2 

5. Show that the following matrices form a basis for M 22 * 



6 . Let Fbe the space spanned by Vl = C os 2 x > \?2 = sin 2 x> V 3 = cos 2x. 

(a) Show that S = { vj, V 2 , V3 } is not a basis for V. 

(b) Find a basis for V. 

7. Find the coordinate vector of w relative to the basis S = {\i \, 112 } for R 1 . 

(a) ui = (1, 0), u 2 = (0, 1); w= (3, -7) 

(b) ui = (2, — 4), u 2 = (3, 8); w = (1, 1) 

(c) U! = (1, 1), u 2 = (0, 2 ); w= (a, b) 

Answer: 


(a) (w)s=(3, -7) 



8 . Find the coordinate vector of w relative to the basis S= {uj, U 2 } of/J J . 

(a) u 1 = (l, — 1 ), u 2 = ( 1 , 1 ); w= ( 1 , 0 ) 

(b) u 1 = (l, — 1 ), u 2 = ( 1 , 1 ); w= ( 0 , 1 ) 

(c) u 1 = (l, —l),u 2 = (l, l);w=(l, 1 ) 

9. Find the coordinate vector of v relative to the basis S = {vi, v 2 , V 3 } . 

(a) v= (2, — 1, 3); vi = (1, 0, 0), v 2 = (2, 2, 0), v 3 = (3, 3, 3) 

(b) v = (5, — 12, 3); vi = (1, 2, 3), v 2 = (— 4, 5, 6 ), v 3 = (7, -8,9) 

Answer: 


(a) (v) 5 =(3, -2,1) 

(b) (v)5=(-2, 0, 1) 



10. Find the coordinate vector of p relative to the basis S= (pi.P 2 .P 3 )- 

(a) p = 4 — 3x I x 2 ; pi = 1, P2 =x , p 3 = 

(b) p = 2 — x 1 x 2 ; pi = 1 I x, p 2 = 1 -I- x 2 , P 3 = x 4- x A 


11. Find the coordinate vector of A relative to the basis S= [A\ , Aj, A^, A 4 ) . 


2 0 . 
-1 3 ’ 


-1 1 
0 0 ’ 



^3 = 




Answer: 

(^=(- 1 , 1 , - 1 , 3 ) 

In Exercises 12-13, show that {A\, Aj, A 2 , A 4 } is a basis for M 22 , and express A as a linear combination of the 
basis vectors. 



Answer: 

A = A\ — A2 + Aj — A4 

In Exercises 14-15, show that (P 1 .P 2 .P 3 ) i sa basis for Pj, and express p as a linear combination of the basis 
vectors. 

14 . pj = 1 + 2x + x 2 > P 2 = 2 + 9x, p3 = 3 + 3x + 4x 2 ; p = 2 + 17x — 3x 2 
15- pi = 1 + x + x 2 > P 2 = x 4- x 2 ’ P 3 = x 2 ; p = 7 — x I 2x" 

Answer: 

P = 7pi -8p 2 + 3p3 

16. The accompanying figure shows a rectangular xy-coordinate system and an x'_y -coordinate system with 
skewed axes. Assuming that 1-unit scales are used on all the axes, find the x ! y '-coordinates of the points 
whose xy-coordinates are given. 

(a) (1, 1) 

(b) ( 1 , 0 ) 

(c) (0, 1) 

(d) (ab) 




17. The accompanying figure shows a rectangular xy-coordinate system determined by the unit basis vectors i and 
j and an x f y '-coordinate system determined by unit basis vectors uj and 112 . Find the x f y '-coordinates of the 
points whose xy-coordinates are given. 

(a) (/3.1) 

(b) ( 1 , 0 ) 

(c) ( 0 , 1 ) 

(d) {a, b ) 

A y and y* 


x' 



Figure Ex-17 


Answer: 


(a) ( 2 , 0 ) 

(b) f_2_ _J_'l 

'ft} 

(c) (0, 1) 


(d) [h a ’ b -fc J 


18. The basis that we gave for M 22 i n Example 4 consisted of noninvertible matrices. Do you think that there is a 
basis for M 22 consisting of invertible matrices? Justify your answer. 

19. Prove that R 30 is infinite-dimensional. 


True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer, 
(a) If V = span{vi,v„} , then {vi,v„} is a basis for V. 






Answer: 


False 

(b) Every linearly independent subset of a vector space V is a basis for V. 

Answer: 

False 

(c) If {vi, V 2 ,.} is a basis for a vector space K, then every vector in V can be expressed as a linear 

combination of v\, V 2 , v„ 

Answer: 

True 

(d) The coordinate vector of a vector x in R n relative to the standard basis for R n is x. 

Answer: 

True 

(e) Every basis of P 4 contains at least one polynomial of degree 3 or less. 

Answer: 

False 
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4.5 Dimension 

We showed in the previous section that the standard basis R n has n vectors and hence that the standard basis 

for p} has three vectors, the standard basis for £ 2 has two vectors, and the standard basis for R 1 (= R j has one 

vector. Since we think of space as three dimensional, a plane as two dimensional, and a line as one 
dimensional, there seems to be a link between the number of vectors in a basis and the dimension of a vector 
space. We will develop this idea in this section. 


Number of Vectors in a Basis 

Our first goal in this section is to establish the following fundamental theorem. 


THEOREM 4.5.1 

All bases for a finite-dimensional vector space have the same number of vectors. 


To prove this theorem we will need the following preliminary result, whose proof is deferred to the end of the 
section. 


THEOREM 4.5.2 

Let Fbe a finite-dimensional vector space, and let {vi, V 2 .v M } be any basis. 

(a) If a set has more than n vectors, then it is linearly dependent. 

(b) If a set has fewer than n vectors, then it does not span V. 


Some writers regard the empty set to be a basis 
for the zero vector space. This is consistent with 
our definition of dimension, since the empty set 
has no vectors and the zero vector space has 
dimension zero. 


We can now see rather easily why Theorem 4.5.1 is true; for if 

S= {vi, v 2 ,.~, v„} 

is an arbitrary basis for V, then the linear independence of S implies that any set in V with more than n vectors 
is linearly dependent and any set in V with fewer than n vectors does not span V. Thus, unless a set in Lhas 
exactly n vectors it cannot be a basis. 


We noted in the introduction to this section that for certain familiar vector spaces the intuitive notion of 
dimension coincides with the number of vectors in a basis. The following definition makes this idea precise. 

Engineers often use the term degrees of 
freedom as a synonym for dimension. 


r 


DEFINITION 1 

The dimension of a finite-dimensional vector space V is denoted by dim(F) and is defined to be the 
number of vectors in a basis for V. In addition, the zero vector space is defined to have dimension zero. 


EXAMPLE 1 


Dimensions of Some Familiar Vector Spaces 


dim(£”) = * 
dim(P M ) =n + 1 
dim(M mM ) = mn 


The standard basis has n vectors. 

The standard basis has n + 1 vectors. 
The standard basis has mn vectors. 


EXAMPLE 2 Dimension of Span(S) 

IfS'= {vi, v 2 ,..., v r ) is a linearly independent set in a vector space V, then S is automatically 
a basis for span(S') (why?), and this implies that 

dim[span(£) ] =r 

In words, the dimension of the space spanned by a linearly independent set of vectors is equal to 
the number of vectors in that set. 


EXAMPLE 3 Dimension of a Solution Space 

Find a basis for and the dimension of the solution space of the homogeneous system 

2x\ + 2x2 — x 3 +*5 = 0 
—xi~X 2 + 2x3 ~ 3x 4 +x 5 = 0 
*1+X2 — 2x 3 — x; — 0 

X3 + X4 + xs = 0 

We leave it for you to solve this system by Gauss-Jordan elimination and show that 
its general solution is 

xq = — s — t, X 2 = s, X 3 = —t, X 4 = 0, x^ — t 


which can be written in vector form as 

(x\,X 2 ,x 3 ,X 4 ,X 5 ) = (-s-t,s, 

or, alternatively, as 

(xi,X 2 ,X 2 ,X 4 ,xs) =s(- 1 , 1 , 0 , 0 , 0 ) +*(- 1 , 0 , - 1 , 0 , 1 ) 

This shows that the vectors vj = ( — 1, 1, 0, 0, 0) and V 2 = ( — 1, 0, — 1, 0, 1) span the 
solution space. Since neither vector is a scalar multiple of the other, they are linearly independent 
and hence form a basis for the solution space. Thus, the solution space has dimension 2. 


EXAMPLE 4 Dimension of a Solution Space 

Find a basis for and the dimension of the solution space of the homogeneous system 

*1 + 3x2“ 2*3 +2x5 =0 

2 x i + 6 x 2 “ 5 x 3 “ 2 x 4 + 4*5 “ 3*6 = 0 
5x3 + 10x4 +15x6 = 0 

2xi+6x2 +8x4 + 4x5 + 18x6 = 0 


In Example 6 of Section 1.2 we found the solution of this system to be 
x\= — 2r — As — 2t, X 2 = r, X 3 = — 2 s, X 4 = s, X 5 = t, x$ = 0 
which can be written in vector form as 

(x 1 .x 2 .x 3 , X 4 , X 5 , X 6 > = (- 3r-As- 2t, r, -2s,s,t, 0) 
or, alternatively, as 

(xi,X 2 , X 3 , X 4 , X 5 ) =r(- 3,1, 0, 0, 0, 0) + s(-4, 0, -2, 1, 0, 0) +t(-2, 0, 0, 0, 1, 0) 


This shows that the vectors 

v 1 = (-3, 1,0, 0,0,0), v 2 = ( — 4, 0 , - 2 , 1 , 0 , 0 ), 


v 3 = ( — 2 , 0 , 0 , 0 , 1 , 0 ) 


span the solution space. We leave it for you to check that these vectors are linearly independent 
by showing that none of them is a linear combination of the other two (but see the remark that 
follows). Thus, the solution space has dimension 3. 


It can be shown that for a homogeneous linear system, the method of the last example always 
produces a basis for the solution space of the system. We omit the formal proof. 


Some Fundamental Theorems 

We will devote the remainder of this section to a series of theorems that reveal the subtle interrelationships 
among the concepts of linear independence, basis, and dimension. These theorems are not simply exercises in 
mathematical theory—they are essential to the understanding of vector spaces and the applications that build 
on them. 


We will start with a theorem (proved at the end of this section) that is concerned with the effect on linear 
independence and spanning if a vector is added to or removed from a given nonempty set of vectors. 
Informally stated, if you start with a linearly independent set S and adjoin to it a vector that is not a linear 
combination of those in S , then the enlarged set will still be linearly independent. Also, if you start with a set S 
of two or more vectors in which one of the vectors is a linear combination of the others, then that vector can be 
removed from S without affecting span(S) (Figure 4.5.1). 


The vector outside the plane 
can be adjoined to the other 
two without affecting their 
linear independence. 


Any of the vectors can 
be removed, and the 
remaining two will still 
span the plane. 


Either of the collinear 
vectors can be removed, 
and the remaining two 
will still span the plane. 


Figure 4.5.1 


Plus/Minus Theorem 

Let S be a nonempty set of vectors in a vector space V. 

(a) If S is a linearly independent set, and if v is a vector in V that is outside of span(£), then the set 
S U {v) that results by inserting v into S is still linearly independent. 

(b) If v is a vector in S that is expressible as a linear combination of other vectors in S , and if S — {v) 
denotes the set obtained by removing v from S , then S — {v) span the same space; that is, 

span(£) = span(£ — (v)) 


EXAMPLE 5 Applying the Plus/Minus Theorem 

Show that pj — ] = and are linearly independent vectors. 

The set S = {p i, P2 } is linearly independent, since neither vector in S is a scalar 
multiple of the other. Since the vector P3 cannot be expressed as a linear combination of the 
vectors in S (why?), it can be adjoined to S to produce a linearly independent set 
£" = {P1.P2.P3}- 


In general, to show that a set of vectors {vi, V2, - - } is a basis for a vector space V, we must show that the 

vectors are linearly independent and span V. However, if we happen to know that Fhas dimension n (so that 
{vi, V 2 ,..v M } contains the right number of vectors for a basis), then it suffices to check either linear 





independence or spanning— the remaining condition will hold automatically. This is the content of the 
following theorem. 


THEOREM 4.5.4 

Let Fbe an ^-dimensional vector space, and let S be a set in V with exactly n vectors. Then S is a basis 
for V if and only if S spans V or S is linearly independent. 


Assume that S has exactly n vectors and spans V. To prove that S is a basis, we must show that S is a 
linearly independent set. But if this is not so, then some vector v in S is a linear combination of the remaining 
vectors. If we remove this vector from S , then it follows from Theorem 4.53b that the remaining set of ^ ] 

vectors still spans V. But this is impossible, since it follows from Theorem 4.5.2 b that no set with fewer than n 
vectors can span an ^-dimensional vector space. Thus S is linearly independent. 

Assume that S has exactly n vectors and is a linearly independent set. To prove that S is a basis, we must show 
that S spans V. But if this is not so, then there is some vector v in V that is not in span (S) . If we insert this 
vector into S , then it follows from Theorem 4.5.3a that this set of ^ ) 1 vectors is still linearly independent. 
But this is impossible, since Theorem 4.5.2a states that no set with more than n vectors in an a-dimensional 
vector space can be linearly independent. Thus S spans V. 

EXAMPLE 6 Bases by Inspection 

By inspection, explain why vj = ( — 3, 7) and V 2 = (5, 5) form a basis for p}. 

) By inspection, explain why vj = (2, 0, — 1), V 2 = (4, 0, 7), and V 3 = ( — 1, 1, 4) form a 
basis for 

Solution 

Since neither vector is a scalar multiple of the other, the two vectors form a linearly 
independent set in the two-dimensional space p}, and hence they form a basis by Theorem 
4.5.4. 

The vectors v 1 and V 2 form a linearly independent set in the xz-plane (why?). The vector V 3 
is outside of the xz-plane, so the set (vj , V 2 , V 3 } is also linearly independent. Since £>-' is 
three-dimensional, Theorem 4.5.4 implies that (vi, V 2 , V 3 } is a basis for p/. 


The next theorem (whose proof is deferred to the end of this section) reveals two important facts about the 
vectors in a finite-dimensional vector space V: 

Every spanning set for a subspace is either a basis for that subspace or has a basis as a subset. 

Every linearly independent set in a subspace is either a basis for that subspace or can be extended to a basis 
for it. 


THEOREM 4.5.5 


Let S be a finite set of vectors in a finite-dimensional vector space V. 

(a) If S spans Fbut is not a basis for V, then S can be reduced to a basis for Fby removing appropriate 
vectors from S. 

(b) If S is a linearly independent set that is not already a basis for V, then S can be enlarged to a basis 
for Vby inserting appropriate vectors into S. 


We conclude this section with a theorem that relates the dimension of a vector space to the dimensions of its 
subspaces. 


THEOREM 4.5.6 

If IF is a subspace of a finite-dimensional vector space V, then: 

(a) W is finite-dimensional. 

(b) dim(JT) <dim(^). 

(c) W=V if and on ly if dim(^ = dim(^). 


We will leave the proof of this part for the exercises. 

Proof (b) Part (a) shows that W is finite-dimensional, so it has a basis 

S= {wi,w 2 ,...,w m } 

Either S is also a basis for V or it is not. If so, then dim(f / ’) = m, which means that dim^) = dim(^F). Ifnot, 
then because S is a linearly independent set it can be enlarged to a basis for Eby part ( b ) of Theorem 4.5.5. But 
this implies that dim(H0 < dim(P’), so we have shown that dim(f^ < dim(P’) in all cases. 

Assume that dirn(fT’) = dim(P’) and that 

S= {wi,w 2 ,...,w m } 

is a basis for W. If S is not also a basis for V, then being linearly independent S can be extended to a basis for V 
by part ( b ) of Theorem 4.5.5. But this would mean that dimf^) > dirn(PF), which contradicts our hypothesis. 
Thus S must also be a basis for V, which means that dim($0 = dim(f /r ). 


Figure 4.5.2 illustrates the geometric relationship between the subspaces of R-' in order of increasing 
dimension. 



OPTIONAL 

We conclude this section with optional proofs of Theorem 4.5.2, Theorem 4.5.3, and Theorem 4.5.5. 

Let S* = /wj, W 2 ,..., w m j= be any set of m vectors in V, where m>n- We 
want to show that S' is linearly dependent. Since S = {vi, v 2 , .. v n } is a basis, each w, can be expressed as a 
linear combination of the vectors in S, say 

W 1 = «11 vi +«21V2 + • ' • 

W2 = «12V1 + «22V2+ • • • 

w m = ai m V! + a 2 m v 2 + • • ‘ +<*nmV„ 

To show that S' is linearly dependent, we must find scalars k 2 , k m , not all zero, such that 

fcfwi + ^2 W 2 + ‘ • • + k m w m = 0 (2) 

Using the equations in 1, we can rewrite 2 as 

ii +* 2 tfi2+ • * • +* m <Jlm)vi 
+ (*1«21 + k 2 a 22 + ■ • • + k m a 2m )\ 2 

+ (kia„i+k 2 a„2+ ‘ ■ ■ +^nm)v M = 0 

Thus, from the linear independence of S, the problem of proving that S' is a linearly dependent set reduces to 
showing there are scalars Aq, k 2 ,k m , not all zero, that satisfy 

a\\k\+a\ 2 k 2 -k* • • • 4-«i m ^ m = 0 

«21*l+«22*2+ • • • + <*2mk m = 0 ^ 

a n\k\ ct n2 k 2 ’¥ ' * • ’k<^nmkm = ^ 

But 3 has more unknowns than equations, so the proof is complete since Theorem 1.2.2 guarantees the 
existence of nontrivial solutions. 

Let S' = jwi, W 2 ,..., w m j= be any set of m vectors in V, where m <n- We 
want to show that S' does not span V. We will do this by showing that the assumption that S' spans V leads to a 
contradiction of the linear independence of {vj, \ 2 , v„} . If S' spans V, then every vector in V is a linear 
combination of the vectors in S'. In particular, each basis vector v 2 is a linear combination of the vectors in S', 


















say 


vi =anwi 4-a 21 w 2 4- • • • 

v 2 = a 12 w i + a 22 w 2 + • • • + a m2 vf m 

v M = <3i„wi+dt 2 «w 2 + • • • +a mn w m 

To obtain our contradiction, we will show that there are scalars Aq, k 2 ,.... k n , not all zero, such that 

fcivi 4-& 2 v 2 + ' ' ‘ +^«v„ = 0 (5) 

But 4 and 5 have the same form as 1 and 2 except that m and n are interchanged and the w's and v's are 
interchanged. Thus, the computations that led to 3 now yield 

a\\k\ -k-a\ 2 k 2 4- • ■ • +ai M £„ = 0 
<*21*1 + <*22*2 + • • • + <*2w*n = 0 

<*ml*l + <*m2*2 + * ’ ’ + <*tmiM*H = 0 

This linear system has more unknowns than equations and hence has nontrivial solutions by Theorem 1.2.2. 

Assume that S = {vj, v 2 ,..v r } is a linearly independent set of vectors in V, 
and v is a vector in V outside of span ( 5 ). To show that S r = |vj, v 2 , v r , v j. is a linearly independent set, 
we must show that the only scalars that satisfy 


* 1 V 1 +* 2 V 2 + ‘ ‘ ‘ + k r v r + A>-|-iv = 0 (6) 

are £q = & 2 = • • • = k r = = 0. But it must be true that = 0 for otherwise we could solve 6 for v 

as a linear combination of vj, v 2 ,..., v r , contradicting the assumption that v is outside of span (5). Thus, 6 
simplifies to 


*ivj 4- &2 v 2 + • ■ • + k r \ r = 0 (7) 

which, by the linear independence of {vi, v 2 .v r ) , implies that 

k\ = k 2 = ’ ’ ’ = k r = 0 

Assume that S= {vj, v 2 ,v r } is a set of vectors in V, and (to be specific) 
suppose that v r is a linear combination of vi, v 2 .v r _i, say 


v r = civi+c 2 v 2 + • • • +c r _iv r _i (8) 

We want to show that if v r is removed from S, then the remaining set of vectors (vi, v 2 ,..} still spans 
S ; that is, we must show that every vector w in span (-S') is expressible as a linear combination of 
( v l> v 2 ,.... v r _i) . But if w is in span (/ST), then w is expressible in the form 

w = *ivi +£ 2 V2+ ’ ‘ ' 4= iv r —i + k r v r 


or, on substituting 8, 


w = £ivi +^2 v 2 + • • ' +^_iv r _i+^(civi+ c: 2 v 2 + - ' ' + c r-l v >-l) 
which expresses w as a linear combination of vi, V 2 .v r _i. 

If S is a set of vectors that spans V but is not a basis for V, then S is a linearly 
dependent set. Thus some vector v in S is expressible as a linear combination of the other vectors in S. By the 
Plus/Minus Theorem (4.5.36), we can remove v from S, and the resulting set S' will still span V. If S' is linearly 
independent, then S' is a basis for V, and we are done. If S' is linearly dependent, then we can remove some 
appropriate vector from S' to produce a set S" that still spans V. We can continue removing vectors in this way 
until we finally arrive at a set of vectors in S that is linearly independent and spans V. This subset of S is a basis 
for V. 

Suppose that dim(^) = n. If S is a linearly independent set that is not already a 
basis for V, then S fails to span V, so there is some vector v in V that is not in span (5) . By the Plus/Minus 
Theorem (4.5.3a), we can insert v into S, and the resulting set S' will still be linearly independent. If S' spans V, 
then S' is a basis for V, and we are finished. If S' does not span V, then we can insert an appropriate vector into 
S’ to produce a set S" that is still linearly independent. We can continue inserting vectors in this way until we 
reach a set with n linearly independent vectors in V. This set will be a basis for V by Theorem 4.5.4. 


Concept Review 

Dimension 

Relationships among the concepts of linear independence, basis, and dimension 

Skills 

Find a basis for and the dimension of the solution space of a homogeneous linear system. 

Use dimension to determine whether a set of vectors is a basis for a finite-dimensional vector space. 
Extend a linearly independent set to a basis. 


Exercise Set 4.5 

In Exercises 1-6, find a basis for the solution space of the homogeneous linear system, and find the 
dimension of that space. 

1. *1+X2- *3 = 0 

— 2x\ — *2 + 2 x 3 = 0 

-xi + *3 = 0 

Answer: 


Basis: (1,0, 1); dimension = 1 


2. 3;ti + *2 + *3 + *4 = 0 
5x i — X 2 + X 3 — X 4 = 0 

3. xi-4x2+ 3x3- *4 = 0 
2x i — 8x2 + 6x3 — 2x4 = 0 

Answer: 

Basis: (4, 1, 0, 0), (—3, 0,1, 0), (1, 0, 0, 1); dimension = 3 

4. xi-3x2+ *3 = 0 
2xi — 6*2 + 2x3 = 0 
3xi — 9x2 + 3*3 = 0 

5. 2xi +X 2 + 3 x 3 = 0 

xi +5x3 = 0 
X2 + X3 = 0 

Answer: 

No basis; dimension = 0 

6. *+ y + z = o 
3x + 2y — 2z = 0 
4x + 3y — z = 0 
6 x + 5,y + z = 0 

7. Find bases for the following subspaces of 

(a) The plane 3x - 2y + 5z = 0- 

(b) The plane x — y = 0- 

(c) The line x = 2 t,y= -t,z = 4t- 

(d) All vectors of the form (a, b, c), where b = a \ c- 


Answer: 



(b) ( 1 , 1 , 0 ), ( 0 , 0 , 1 ) 

(c) (2, -1.4) 

(d) (1,1,0), (0,1,1) 


8. Find the dimensions of the following subspaces of £ 4 . 

(a) All vectors of the form (a, b,c, 0). 

(b) All vectors of the form (a, b, c, d ), where d = a + b and c — a-b- 

(c) All vectors ofthe form ( a, b, c, d), where a = b = c = d- 

9. Find the dimension of each ofthe following vector spaces. 

(a) The vector space of all diagonal nxn matrices. 



(b) The vector space of all symmetric nxn matrices. 

(c) The vector space of all upper triangular nxn matrices. 

Answer: 

(a) n 

(b) «(«+!) 

2 

(c) ”(«+!) 

2 

10. Find the dimension of the subspace of P3 consisting of all polynomials an \ a\x \ a->x l ! a yr' f° r which 

t20 = 0. 

(a) Show that the set W of all polynomials in P 2 such that *(1) = 0 is a subspace of P 3. 

(b) Make a conjecture about the dimension of W. 

(c) Confirm your conjecture by finding a basis for W. 

12. Find a standard basis vector for Pp that can be added to the set {vi, V2} to produce a basis for 

(a) vi = ( - 1, 2, 3), v 2 = (1, - 2, - 2) 

(b) vi = (1. -1,0). v 2 = (3, 1, -2) 

13. Find standard basis vectors for that can be added to the set (vj, V2} to produce a basis for 

Vl = (l. -4.2. -3). v 2 = ( — 3, 8, -4,6) 


Answer: 

Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 

14. Let {v1.v2.v3} be a basis for a vector space V. Show that {u1.u2.u3} is also a basis, where ui =vi, 
u 2 = v i 4- V2, and U3 = vi 4- V2 + V3. 

15. The vectors vi = (1, —2, 3) and V2 = (0, 5, — 3) are linearly independent. Enlarge {vi, V2} to a basis 
for* 3 . 

Answer: 

v 3 = (a, b, c ) with Sa - 3b - 5c * 0 

16. The vectors v\ = (1, — 2, 3, —5) and V2 = (0, —1,2, — 3) are linearly independent. Enlarge 
{vi, V2 } to a basis for £ 4 . 

(a) Show that for every positive integer n , one can find n \ 1 linearly independent vectors in — oo, oo) 
. [Hint: Look for polynomials.] 

(b) Use the result inpart (a) to prove that F{ — oo, oo) is infinite- dimensional. 

(c) Prove that C ( — oo, oo), C m ( — oo, oo) ? and C°°( — oo, oo) are infinite-dimensional vector spaces. 

18. Let She a basis for an ^-dimensional vector space V. Show that if vi, V2,..., v r form a linearly 
independent set of vectors in V, then the coordinate vectors (v^)^, (V2) ..., (v r ) ^ form a linearly 

independent set in R n , and conversely. 



19. Using the notation from Exercise 18, show that if the vectors v\, V 2 , v r span V, then the coordinate 
vectors (vi)^, (v 2 )^,(v r ) $ span R n , and conversely. 

20. Find a basis for the subspace of P 2 spanned by the given vectors. 

(a) -1 +x-2x 2 ,3 + 3x + 6x 2 ,9 

(b) 1 + x, x 2 , —2 4- 2x 2 , —3x 

(c) 1 + x - 3x 2 ,2 + 2x- 6x 2 , 3 + 3x- 9x 2 

[Hint: Let S be the standard basis for P 2 , and work with the coordinate vectors relative to S as in Exercises 
18 and 19.] 

21. Prove: A subspace of a finite-dimensional vector space is finite-dimensional. 

22. State the two parts of Theorem 4.5.2 in contrapositive form. 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) The zero vector space has dimension zero. 

Answer: 

True 

(b) There is a set of 17 linearly independent vectors in £ V 
Answer: 

True 

(c) There is a set of 11 vectors that span R * ; . 

Answer: 

False 

(d) Every linearly independent set of five vectors in is a basis for pj' . 

Answer: 

True 

(e) Every set of five vectors that spans R-’ is a basis for p- 1 . 

Answer: 

True 

(f) Every set of vectors that spans R n contains a basis for R n . 

Answer: 


True 


(g) Every linearly independent set of vectors in R n is contained in some basis for R n . 

Answer: 

True 

(h) There is a basis for M 22 consisting of invertible matrices. 

Answer: 

True 

A 5 , -A, A . 

dependent. 

Answer: 

True 

(j) There are at least two distinct three-dimensional subspaces of Pi- 
Answer: 

False 


rn 2 

v ’ If A has size nxn and j A A 1 A n are distinct matrices, then 


J: is linearly 
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4.6 Change of Basis 

A basis that is suitable for one problem may not be suitable for another, so it is a common process in the study 
of vector spaces to change from one basis to another. Because a basis is the vector space generalization of a 
coordinate system, changing bases is akin to changing coordinate axes in p} and R-'. In this section we will 
study problems related to change of basis. 


Coordinate Maps 

lfS= {vj, V 2 ,.... v M } is a basis for a finite-dimensional vector space V, and if 

( V )<S' = ( c l> c 2 . c„) 

is the coordinate vector of v relative to S , then, as observed in Section 4.4 , the mapping 

V—(v)s (1) 


creates a connection (a one-to-one correspondence) between vectors in the general vector space V and vectors 
in the familiar vector space R n . We call 1 the coordinate map from V to R n . In this section we will find it 
convenient to express coordinate vectors in the matrix form 

c\- 
c 2 

: ( 2 ) 

c n 

where the square brackets emphasize the matrix notation (Figure 4.6.1). 

Coordinate map 





1 Is 

~C\~ 

V 


c 2 



c* 


V R ■ 


Figure 4.6.1 


Change of Basis 

There are many applications in which it is necessary to work with more than one coordinate system. In such 
cases it becomes important to know how the coordinates of a fixed vector relative to each coordinate system 
are related. This leads to the following problem. 






The Change-of-Basis Problem 

If y is a vector in a finite-dimensional vector space V , and if we change the basis for V from a basis B 
to a basis B\ how are the coordinate vectors [ v] £ and [v] 

L J 

To solve this problem, it will be convenient to refer to B as the “old basis” and B' as the “new 
basis.” Thus, our objective is to find a relationship between the old and new coordinates of a fixed vector v in 
V. 


For simplicity, we will solve this problem for two-dimensional spaces. The solution for ^-dimensional spaces 
is similar. Let 

B = |uj, u 2 1 and B' = {u'j, } 

be the old and new bases, respectively. We will need the coordinate vectors for the new basis vectors relative 
to the old basis. Suppose they are 




and 



That is. 


Now let v be any vector in V, and let 


be the new coordinate vector, so that 


Uj = cui ■+ i>U2 
U 2 = cuj + du2 



(3) 

(4) 

(5) 


v = £iUj +&2U2 


( 6 ) 


In order to find the old coordinates of v, we must express v in terms of the old basis B. To do this, we 
substitute 4 into 6. This yields 

v = £1 (auj 4- &U 2 ) + &2( cu l + <^ u 2) 


or 


v = (k\a 4- £2 c ) u 1 + + ^2^) u 2 

Thus, the old coordinate vector for v is 


[v] B = 


+ k2C 
k\b-kk2d 


which, by using 5, can be written as 









This equation states that the old coordinate vector [v] £ results when we multiply the new coordinate vector 
[v] £•' on the left by the matrix 

P=\ a C 

[b d J 

Since the columns of this matrix are the coordinates of the new basis vectors relative to the old basis [see 3] 
we have the following solution of the change-of-basis problem. 


Solution of the Change-of-Basis Problem 

If we change the basis for a vector space V from an old basis B = {ui, U 2 ,.... u M ) to a new basis 
B = \ u'i, U 2 ,.... u„ j> 5 then for each vector v in V, the old coordinate vector [v] £ is related to the 
new coordinate vector [v] by the equation 

[v] b = .P[v] F ' (7) 

where the columns of P are the coordinate vectors of the new basis vectors relative to the old basis; 
that is, the column vectors of P are 

["1 Is- Ws. Kb («) 


Transition Matrices 

The matrix P in Equation 7 is called the transition matrix from B ! to B. For emphasis, we will often denote it 
by Pq'_ .£ It follows from 8 that this matrix can be expressed in terms of its column vectors as 

••![<]*] O) 

Similarly, the transition matrix from B to B’ can be expressed in terms of its column vectors as 

Pb->b' = [ [ u i]b'|[ u 2]b'| ' ' ' |[ u «]b''] (10) 


There is a simple way to remember both of these formulas using the terms “old basis” and “new 
basis” defined earlier in this section: In Formula 9 the old basis is B f and the new basis is B, whereas in 
Formula 10 the old basis is B and the new basis is B 1 ■ Thus, both formulas can be restated as follows: 






The columns of the transition matrix from an old basis to a new basis are the coordinate vectors of the 
old basis relative to the new basis. 


EXAMPLE 1 Finding Transition Matrices 

Consider the bases B = {ui, 112} and & = \ u i > u 2 } for r}, where 

u i = (1, 0), u 2 =(0,l), uj = (l,l), u 2 = (2> 1) 

Find the transition matrix Pgr _g from B' to B. 

Find the transition matrix Pg _g< from B to B r . 


Solution 

Here the old basis vectors are Uj and u-, and the new basis vectors are uj and U 2 . We want 
to find the coordinate matrices of the old basis vectors uj and u-, relative to the new basis 
vectors uj and 112 . To do this, first we observe that 

Uj = uj + U 2 

U2 = 2 uj + U2 


from which it follows that 


and hence that 





P B'^B = 


1 2 
1 1 


2 

1 


Here the old basis vectors are uj and U 2 and the new basis vectors are u| and u-,. As in part 
(a), we want to find the coordinate matrices of the old basis vectors uj and u-,, relative to 
the new basis vectors W and U 2 . To do this, observe that 

uj = — uj + U2 
u 2 = 2uj - U 2 


from which it follows that 

[«i]b' 


and [u 2 ] B ' = 


2 

-1 


F B-*B’ 


-1 2 

1 -1 


and hence that 














Suppose now that 5 and B' are bases for a finite-dimensional vector space V. Since multiplication by Pg> _g 
maps coordinate vectors relative to the basis B r into coordinate vectors relative to a basis B, and Pg ,g> maps 
coordinate vectors relative to B into coordinate vectors relative to B f , it follows that for every vector v in V 
we have 


[v]_g = P_g*-^ i g[v] i g' 

(11) 


(12) 


EXAMPLE 2 Computing Coordinate Vectors 

Let B and B f be the bases in Example 1. Use an appropriate formula to find [v] g given that 

r i Ml 

Ms'= < 


To find [v] £ we need to make the transition from B l to B. It follows from Formula 
11 and part (a) of Example 1 that 


[v] b = P B '_+b[v] B ' 


'1 2 ' 

'- 3 ' 


~T 

_1 1 _ 

5 _ 


_ 2 _ 


Invertibility of Transition Matrices 

If B and B ! are bases for a finite-dimensional vector space V. then 

(Pb'->b) (P = ?B^B 

because multiplication by (Pg 1 .g) (Pg ,g ! ) first maps 5-coordinates of a vector into ^-coordinates, and 
then maps those B r -coordinates back into the original 5-coordinates. Since the net effect of the two operations 
is to leave each coordinate vector unchanged, we are led to conclude that Pg ,g must be the identity matrix, 
that is, 




( 13 ) 


(we omit the formal proof). For example, for the transition matrices obtained in Example 1 we have 


(Pb’->b) (Pb^b 1 ) 


1 2 

-1 2 " 


"1 0 ' 

\ 1 _ 

1 - 1 _ 


_° 1 _ 


It follows from 13 that P B f _. B is invertible and that its inverse is P B . B * Thus, we have the following 
theorem. 
















THEOREM 4.6.1 


If P is the transition matrix from a basis B f to a basis B for a finite-dimensional vector space V, then P 
is invertible and p ~* is the transition matrix from B to 


An Efficient Method for Computing Transition Matrices for R n 

Our next objective is to develop an efficient procedure for computing transition matrices between bases for 
R”. As illustrated in Example 1, the first step in computing a transition matrix is to express each new basis 
vector as a linear combination of the old basis vectors. For R >! this involves solving n linear systems of n 
equations in n unknowns, each of which has the same coefficient matrix (why?). An efficient way to do this is 
by the method illustrated in Example 2 of Section 1.6, which is as follows: 

r n 


A Procedure for Computing Pb B' 

Step 1 Form the matrix \J> ! |i?J. 

Step 2 Use elementary row operations to reduce the matrix in Step 1 to reduced row echelon form. 

Step 3 The resulting matrix will be 

Step 4 Extract the matrix P£ from the right side of the matrix in Step 3. 

J 


This procedure is captured in the following diagram. 

row operations 

[ new b asis | old b asis ] —> [ / [transition from old to new ] 


(14) 


EXAMPLE 3 Example 1 Revisited 

In Example 1 we considered the bases B = {uj, 112 } and B' = juf', U 2 ^ | for r }, where 
H = (1.0). u 2 =(0,l), ui'=(l,l), u 2 '=(2,1) 

Use Formula 14 to find the transition matrix from B' to B. 

(b) Use Formula 14 to find the transition matrix from B to B'. 


Solution 


Here B’ is the old basis and B is the new basis, so 

[new basis|old basis] = 


Since the left side is already the identity matrix, no reduction is needed. We see by 
inspection that the transition matrix is 

"1 2 ' 


1 0 

1 2 

0 1 

1 1 


P B , ^B = 


1 1 


1 2 

1 0 " 

1 1 

0 1 


which agrees with the result in Example 1. 

Here B is the old basis and B ! is the new basis, so 

[new basis|old basis] = 

By reducing this matrix, so the left side becomes the identity we obtain (verify) 
[/|transition from old to new] = 

so the transition matrix is 

p r- 1 2 

1 _1 

which also agrees with the result in Example 1. 


"l 0 

-1 2 ' 

0 1 

1 -1 


Transition to the Standard Basis for R n 

Note that in part (a) of the last example the column vectors of the matrix that made the transition from the 
basis B’ to the standard basis turned out to be the vectors in B‘ written in column form. This illustrates the 
following general result. 


THEOREM 4.6.2 

Let B' = {ui> u 2 . u m} beany basis for the vector space R n and let S' = {e j, e 2 ,..e„) be the 

standard basis for R n . If the vectors in these bases are written in column form, then 

P B , —*S = t u l| u 2| ‘ ’ ‘ | u m] (15) 


It follows from this theorem that if 


A= [ui|u 2 | • • • |u„] 















is any invertible nxn matrix, then A can be viewed as the transition matrix from the basis {uj, U 2 ,u^} 
for R” to the standard basis for R n . Thus, for example, the matrix 


A = 


1 2 3 

2 5 3 
1 0 8 


which was shown to be invertible in Example 4 of Section 1.5, is the transition matrix from the basis 

ui = ( 1 , 2 . 1 ). u 2 =( 2 . 5 . 0 ). u 3 = ( 3 , 3 , 8 ) 

to the basis 


ei = (1,0,0), e 2 = (0,1,0), e 3 = (0,0,1) 


Concept Review 

Coordinate map 
Change-of-basis problem 
Transition matrix 

Skills 

Find coordinate vectors relative to a given basis directly. 
Find the transition matrix from one basis to another. 

Use the transition matrix to compute coordinate vectors. 


Exercise Set 4.6 

1. Find the coordinate vector for w relative to the basis S= {uj, U 2 ) for R^. 

(a) ui = (l,0), u 2 = (0, 1); w= (3, -7) 

(b) “1 = (2, -4), u 2 = (3, 8 ), w= ( 1 , 1 ) 

(c) U 1 = ( 1 , 1 ). u 2 = ( 0 , 2 ); w= (a, b ) 

Answer: 

(a) Ms= 

5 _~ 

28 
3 _ 

14 


(b) 








(c) a 

[w] s = kszA 

2 

2. Find the coordinate vector for v relative to the basis S = {vi, V 2 , V 3 } for p/. 

(a) v= (2, — 1, 3); vi = (1, 0, 0), v 2 = (2, 2, 0), v 3 = (3, 3, 3) 

(b) v = (5, — 12, 3); vi = (1, 2, 3), v 2 = (— 4, 5, 6 ), v 3 = (7, -8,9) 

3. Find the coordinate vector for p relative to the basis S= {p 1 , p 2 , P 3 ) for P 2 - 

(a) p = 4 — 3x I ■ x“; pj = 1 , p 2 = x, p 3 = x z 

(b) p = 2 — x I x 2 ; pj = 1 + x, p 2 = 1 -F x A , p 3 = x =F x A 

Answer: 

(a) 4" 

(p)s=(4. -3,1), [p]^= -3 

1 

(b) r 0 " 

(p)^=( 0 , 2 , - 1 ), [p]^= 2 

-1 

4. Find the coordinate vector for A relative to the basis S = {A\, Aj, j4 3 , A 4 ) for Mji- 



5. Consider the coordinate vectors 

Ms 

(a) Find w if S is the basis in Exercise 2(a). 

(b) Find q if S is the basis in Exercise 3(a). 

(c) Find B if S is the basis in Exercise 4. 

Answer: 

(a) w= (16, 10 , 12 ) 

(b) q = 3 + 4x~ 

(c) n_ 15 —1 

6 3 




6 . Consider the bases B = {ui, 112 } and t ’’ — | u i. u ? [= for g}, where 



T 


'O' 

f 

'2' 

f 

ui = 

0 

. u 2 = 

1 

, Uj = 

1 

• u 2 = 


-3 

4 


(a) Find the transition matrix from B' to B. 

(b) Find the transition matrix from B to B ! . 

(c) Compute the coordinate vector [w] g, where 


w = 


3 

-5 


and use 10 to compute [w] g\ 

(d) Check your work by computing [w] g' directly. 

7. Repeat the directions of Exercise 6 with the same vector w but with 

ui = 


" 2 ' 


4 ' 


T 

/ 

_ 2 _ 

. «2 = 


, Uj = 

_ 3 _ 

> u 2 = 


Answer: 

(a) 


11 

10 


-4 0 


(b) 


(c) 


0 

-2 - 

Wb = 


_5 

2 

13 

2 


17 

10 

8 

5 


[w] s' = 


-4 

-7 


8. Consider the bases B = {ui, 112,113} and B* — \ u i > u 2 > u 3 ] ; for where 



'-3' 


'-3' 


1 

ui = 

1 

1 

00 0 

1 _ 

. u 2 = 

2 

-1 

- u 3 = 

6 

-1 

u i = 

1 l 

CO ^ O 

1 I 

1 _1 

, vl 2 = 

1 l 

CM CO ^ 

1 1 

1_1 

, «3 = 

'-2' 

-3 

7 


(a) Find the transition matrix from B to B r ■ 

(b) Compute the coordinate vector [w] g, where 


w = 


-5 

8 

-5 



and use 12 to compute [w] %>. 

(c) Check your work by computing [w] g> directly. 

9. Repeat the directions of Exercise 8 with the same vector w, but with 

' 2 1 2 1 P" 

ui = 1 , U 2 = -1 , u 3 = 2 

lj [ lj [l 

3i [ii r-r 

Uj = 1 , U2 = 1 , U3 = 0 

- 5 J [- 3 J [ 2 


Answer: 



(b) r n Li" 

9 2 

[w] B = -9 , [w] B < = 23 

5 2 

6 

10. Consider the bases B = {pi, P 2 } and B' = =j qi, q 2 ]= for P\ where 

Pl = 6 + 3x, P2 = 10 + 2x, qi = 2, q2 = 3 + 2;c 

(a) Find the transition matrix from B r to B. 

(b) Find the transition matrix from B to B ! ■ 

(c) Compute the coordinate vector [p] g, where p = —4 f 1 . and use 12 to compute [p ] g : . 

(d) Check your work by computing [p ] g directly. 

11. Fet Fbe the space spanned by f j = sin x and f j = cos x. 

(a) Show that gj = 2sin x + cos x and g 2 = 3cos x form a basis for V. 

(b) Find the transition matrix from B ! = J gi, g 2 } to 5= (f 1 , f 2 } • 

(c) Find the transition matrix from B to £?\ 

(d) Compute the coordinate vector [h] g, where h = 2sm ^ — 5cos x? and use 12 to obtain [h] g'. 

(e) Check your work by computing [h] g> directly. 

Answer: 

(b) [2 O' 

1 3_ 



(C) 


_i 1 

6 3 



12. Let S be the standard basis for p^, and let B = {vi, v 2 } be the basis in which vi = (2, 1) and 
v 2 = ( ~ 3, 4) 

(a) Find the transition matrix Pg_>s by inspection. 

(b) Use Formula 14 to find the transition matrix P% 

(c) Confirm that Pb~*S an d -B are inverses of one another. 

(d) Let w= (5, — 3) Find [w] £ and then use Formula 11 to compute [w] £ 

(e) Let w= (3, — 5) Find [w] £ and then use Formula 12 to compute [w] £ 

13. Let S be the standard basis for p}, and let B= {vj,V 2 , V 3 } be the basis in which vi = (1, 2, 1), 
v 2 =(2,5,0), and V 3 = (3, 3, 8). 

(a) Find the transition matrix ?£ ,£ by inspection. 

(b) Use Formula 14 to find the transition matrix P£ ,£. 

(c) Confirm that P£ .£ and P£ .£ are inverses of one another. 

(d) Let w= (5, —3, 1). Find [w] £ and then use Formula 11 to compute [w] £. 

(e) Let w=(3, —5, 0). Find [w] £ and then use Formula 12 to compute [w] £. 

Answer: 

(a) 12 3 
2 5 3 
1 0 8 

(b) —40 16 9" 

13 -5 -3 
5 -2 -1 

(d) r —2391 

[w] B = 77 , [w]^ 

30 

(e) 3 

[■w] s = -5 , [w] £ = 

0 

14. Let5i= ( u l. u 2 ) andi ?2 = (vi,V 2 ) be the bases for in which 
ui = (2, 2), u 2 = (4, -1), vi = (1,3), and v 2 = (— 1, -1). 

(a) Use Formula 14 to find the transition matrix Pb 2 ~*B\■ 

(b) Use Formula 14 to find the transition matrix Pb\->B 2 - 

(c) Confirm that P £- _^ and Pb\-*B 2 are inverses of one another. 




(d) Let w= (5, — 3). Find [w] £^ and then use the matrix Pb\-*B 2 to compute [w] from [w] gj. 

(e) Let w= (3, — 5). Find [w] £.-, and then use the matrix Pb 2 ~*B\ to compute [w] from [w] £.-,. 

15. Leti?i= {ui,u 2 } andi ? 2 = {vi,v 2 } be the bases for p} in which ui = (1, 2), U 2 = (2, 3), 
vi = (1, 3), andv 2 = (1, 4). 

(a) Use Formula 14 to find the transition matrix Pb 2 —*B\ ■ 

(b) Use Formula 14 to find the transition matrix Pb\-*B 2 - 

(c) Confirm that p£-, ,£^ and PB\ -Bo are inverses of one another. 

(d) Let w= (0, 1). Find [w] £^ and then use the matrix Pb\-*B 2 to compute [w] £>- from [w] 

(e) Let w= (2, 5). Find [w] £ z , and then use the matrix -Pb 2 —»i?i to compute [w] £^ from [w] £> 2 - 


Answer: 



16. Let5i= {ui,U 2 , U 3 } andi ? 2 = {vi,v 2 , V 3 } be the bases for in which uj = ( — 3, 0, —3), 

U 2 = (- 3, 2, - 1), 113 = (1, 6 , - 1), vi = ( - 6 , - 6 , 0), V 2 = ( - 2, - 6 ,4), and 

v 3 = ( — 2, -3,7). 

(a) Find the transition matrix PBi~+B 2 - 

(b) Let w = ( — 5, 8, — 5). Find [w] £^ and then use the transition matrix obtained in part (a) to 
compute [w] £.-, by matrix multiplication. 

(c) Check the result in part (b) by computing [w] £> 2 directly. 

17. Follow the directions of Exercise 16 with the same vector w but with ui = (2, 1, 1), u 2 = (2, —1,1), 
113 = ( 1 , 2 , 1), vi = (3,1, - 5), v 2 = (1, 1, - 3), and v 3 = (- 1, 0, 2). 

Answer: 




(b) 


_7 

9] 2 

[w]b!= -9 , [w ]b 2 = 23 

-5 2 

6 

18. Let S = {e i, e 2 } be the standard basis for p}, and let B = {vi, V 2 } be the basis that results when the 
vectors in S are reflected about the line y = x. 

(a) Find the transition matrix Pq 

(b) Let P = Pq and show that p? = p c , 

19. Let S = {e i, e 2 } be the standard basis for p 2 , and let B = {vi, V 2 } be the basis that results when the 
vectors in S are reflected about the line that makes an angle 9 with the positive x-axis. 

(a) Find the transition matrix Pq 

(b) Let P = Pq and show that p? = p 

Answer: 

(a) cos 29 sin 29 
sin 29 —cos 29 

20. If B\, 82 , and B 3 are bases for p 2 , and if 

Pb 1 -+B 2 = 5 2 and P *2-*3 = \ 

then Pb 3 -B\ =- 

21. If P is the transition matrix from a basis B to a basis B, and Q is the transition matrix from B to a basis C, 
what is the transition matrix from B' to Cl What is the transition matrix from C to B'l 

22. To write the coordinate vector for a vector, it is necessary to specify an order for the vectors in the basis. If 
P is the transition matrix from a basis B' to a basis B, what is the effect on P if we reverse the order of 
vectors in B from vi,..v„ to v„,..vi ? What is the effect on P if we reverse the order of vectors in 
both B' and B 1 

23. Consider the matrix 

'1 1 O' 

P= 1 0 2 
0 2 1 

(a) P is the transition matrix from what basis B to the standard basis S = {e 1 , e 2 , e 3 } for p-'l 

(b) P is the transition matrix from the standard basis S= {e^, e 2 , e^} to what basis B for R-'l 


Answer: 


(a) B= {(1.1.0). (1,0,2), (0,2,1)) 


(b) B = 


(f }• -!> fe- 



2 2 1 
5’ 5’ 5 




24. The matrix 


P = 


1 0 0 
0 3 2 
0 1 1 


is the transition matrix from what basis B to the basis {(1, 1, 1), (1, 1,0), (1,0,0)) for ^7 


25. Let B be a basis for R n . Prove that the vectors vj, V 2 ,..., form a linearly independent set in R n if and 
only if the vectors [ vi ] g, [ V 2 ] g, . .., [v^] £ form a linearly independent set in R”. 

26. Let B be a basis for R n . Prove that the vectors vj, V 2 ,..Vfc span R n if and only if the vectors 

[ V 1 ] B’ [^2 \b . [ v fc] £ span R n . 

27. If [w] £ = w holds for all vectors w in R n , what can you say about the basis 5? 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) If B\ and 82 are bases for a vector space V, then there exists a transition matrix from B\ to 52- 

Answer: 

True 

(b) Transition matrices are invertible. 

Answer: 

True 

(c) If B is a basis for a vector space R ”, then P£ .£ is the identity matrix. 

Answer: 

True 

(d) If Pbi — >£2 is a diagonal matrix, then each vector in B'i is a scalar multiple of some vector in B\. 
Answer: 

True 

(e) If each vector in B 2 is a scalar multiple of some vector in B\, then P is a diagonal matrix. 

Answer: 

False 

(f) If A is a square matrix, then A = Pb\^>B 2 f° r some bases B\ and B'i for R n . 

Answer: 


False 
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4.7 Row Space, Column Space, and Null Space 

In this section we will study some important vector spaces that are associated with matrices. Our work here will provide 
us with a deeper understanding of the relationships between the solutions of a linear system and properties of its 
coefficient matrix. 


Row Space, Column Space, and Null Space 

Recall that vectors can be written in comma-delimited form or in matrix form as either row vectors or column vectors. 
In this section we will use the latter two. 


DEFINITION 1 

For an m x « matrix 



an 

«12 • 

-- a \n 

«21 

S22 • 

- <*2n 

<*m 1 

<*m2 ■ 



the vectors 


r i 

= 

[<311 

«12 - 

- <31«] 

r 2 

= 

[ a 2\ 

a 22 - 

- <*2n] 


= 

[<*m\ 

a m2 - 

- «tnn] 


in R n that are formed from the rows of A are called the row vectors of A, and the vectors 



^ CN 

1_ 


’^12" 

«22 


a\ n 

a 2n 

Ci = 


. c 2 = 

“m2 

z n — 

^mn 


in R m formed from the columns of A are called the column vectors of A. 


J 


EXAMPLE 1 


Row and Column Vectors of a 2 * 3 Matrix 


◄ 


Let 


A = 


2 

3 


1 0 

-1 4 


The row vectors of A are 

ri = [2 

and the column vectors of A are 

2 ' 
3 


1 0] andr 2 = [3 -1 4] 


C2 = 



c 3 = 


0 

4 


ci = 


















The following definition defines three important vector spaces associated with a matrix. 


DEFINITION 2 

If A is an m x n matrix, then the subspace of R n spanned by the row vectors of A is called the row space of A, 
and the subspace of R m spanned by the column vectors of A is called the column space of A. The solution space 
of the homogeneous system of equations Ax = Q ? which is a subspace of 5”, is called the null space of A. 


J 


In this section and the next we will be concerned with two general questions: 

Question 1 . What relationships exist among the solutions of a linear system Ax = b and the row space, column space, 
and null space of the coefficient matrix A? 

Question 2. What relationships exist among the row space, column space, and null space of a matrix? 

Starting with the first question, suppose that 



11 

<*12 • 

-- a \n 


■*r 

A = 

*21 

<*22 • 

- <*2n 

and x = 

*2 


fljal 

<*m2 - 





It follows from Formula 10 of Section 1.3 that if c i, C 2 ,..c„ denote the column vectors of A, then the product Ax can 
be expressed as a linear combination of these vectors with coefficients from x ; that is, 

Ax = ;qci +*2 c 2 + —+ x m c„ (1) 

Thus, a linear system, Ax = b> of m equations in n unknowns can be written as 

*ici + *2 C 2 + ... + = b ( 2 ) 

from which we conclude that Ax = b is consistent if and only if is expressible as a linear combination of the column 
vectors of A. This yields the following theorem. 


THEOREM 4.7.1 

A system of linear equations Ax = b is consistent if and only if b is in the column space of A. 

EXAMPLE 2 A Vector b in the Column Space of A 


Let Ax = b be the linear system 






"-1 3 2' 

"*f 


r 

1 2 -3 

*2 

= 

-9 

2 1 -2 

*3 


-3 


Show that b is in the column space of A by expressing it as a linear combination of the column vectors of 
A. 

Solving the system by Gaussian elimination yields (verify) 
x\=2 7 X2= — 1 , *3 = 3 

It follows from this and Formula 2 that 


-i 


3 


2 


1 

1 

2 


2 

1 

+ 3 

l 

1 1 

t\J oo 

— 

-9 

-3 


Recall from Theorem 3.4.4 that the general solution of a consistent linear system = b can be obtained by adding any 
specific solution of this system to the general solution of the corresponding homogeneous system Ax = 0- Keeping in 
mind that the null space of A is the same as the solution space of Ax = 0? we can rephrase that theorem in the following 
vector form. 


THEOREM 4.7.2 

If xq is any solution of a consistent linear system Ax = b? and if S = { vi, v 2 ,.. } is a basis for the null 
space of A, then every solution of Ax = b can be expressed in the form 

x = x 0 + civi +c 2 V2+... + CfcVft (3) 

Conversely, for all choices of scalars c\, the vector x in this formula is a solution of Ax = b- 


Equation 3 gives a formula for the general solution of Ax = b- The vector xq in that formula is called a particular 
solution of Ax = h> and the remaining part of the formula is called the general solution of Ax = Q. In words, this 
formula tells us that. 

The general solution of a consistent linear system can be expressed as the sum of a particular solution of that system 
and the general solution of the corresponding homogeneous system. 

Geometrically, the solution set of Ax = b can be viewed as the translation by xq of the solution space of Ax = 0 (Figure 
4.7.1). 


















Figure 4 . 7.1 


EXAMPLE 3 General Solution of a Linear System Ax = b 

In the concluding subsection of Section 3.4 we compared solutions of the linear systems 








*1 










*1 



'l 

3 

-2 

0 

2 

o' 

x 2 


'o' 


'i 

3 

-2 

0 

2 

o' 

x 2 


o' 

2 

6 

-5 

-2 

4 

-3 

*3 


0 

and 

2 

6 

-5 

-2 

4 

-3 

x 3 


-1 

0 

0 

5 

10 

0 

15 

x 4 


0 

0 

0 

5 

10 

0 

15 

x 4 


5 

2 

6 

0 

8 

4 

18 

x 5 


0 


2 

6 

0 

8 

4 

18 

x 5 


6 







x 6 










x 6 




and deduced that the general solution x of the nonhomogeneous system and the general solution x^ of the 
corresponding homogeneous system (when written in column-vector form) are related by 




Recall from the Remark following Example 4 of Section 4.5 that the vectors in x^ form a basis for the solution space of 

Ax = 0- 


Bases for Row Spaces, Column Spaces, and Null Spaces 

We first developed elementary row operations for the purpose of solving linear systems, and we know from that work 
that performing an elementary row operation on an augmented matrix does not change the solution set of the 
corresponding linear system. It follows that applying an elementary row operation to a matrix A does not change the 
solution set of the corresponding linear system Ax = Q ? °E stated another way, it does not change the null space of A. 
Thus we have the following theorem. 

































THEOREM 4.7.3 


Elementary row operations do not change the null space of a matrix. 


The following theorem, whose proof is left as an exercise, is a companion to Theorem 4.7.3. 


THEOREM 4.7.4 

Elementary row operations do not change the row space of a matrix. 


Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary row operations do not change the 
column space of a matrix. To see why this is not tme, compare the matrices 


A = 


1 3 

2 6 


and 


B = 


1 3 
0 0 


The matrix B can be obtained from A by adding -2 times the first row to the second. However, this operation has 
changed the column space of A, since that column space consists of all scalar multiples of 

T 

2 


whereas the column space of B consists of all scalar multiples of 


1 

0 


and the two are different spaces. 


EXAMPLE 4 Finding a Basis for the Null Space of a Matrix 

Find a basis for the null space of the matrix 

1 3 -2 0 2 O' 

2 6 -5 -2 4 -3 

0 0 5 10 0 15 

2 6 0 8 4 18 


The null space of A is the solution space of the homogeneous linear system = 0> which, as 
shown in Example 3, has the basis 












VI = 


’- 3 ' 


’-4' 


’-2' 

1 


0 


0 

0 


-2 


0 

0 

. v 2 = 

1 

, v 3 = 

0 

0 


0 


1 

0 


0 


0 


Observe that the basis vectors v\, V 2 , and V 3 in the last example are the vectors that result by successively 
setting one of the parameters in the general solution equal to 1 and the others equal to 0 . 

The following theorem makes it possible to find bases for the row and column spaces of a matrix in row echelon form 
by inspection. 


THEOREM 4.7.5 

If a matrix R is in row echelon form, then the row vectors with the leading 1 's (the nonzero row vectors) form a 
basis for the row space of R, and the column vectors with the leading 1 's of the row vectors form a basis for the 
column space of R. 


The proof involves little more than an analysis of the positions of the 0's and 1 's of R. We omit the details. 


EXAMPLE 5 Bases for Row and Column Spaces 

The matrix 

'1 -2 5 0 3' 

R= 0 1300 

0 0 0 1 0 
0 0 0 0 0 


is in row echelon form. From Theorem 4.7.5, the vectors 




r l 

= [i - 

-2 5 

0 3] 


r 2 

= [0 1 

3 0 0] 


r 3 

= [0 0 

0 

1 0] 


form a basis for the row space of R , and the vectors 





T 


-2" 


"o' 


0 


1 


0 

Cl = 

0 

. C2 = 

0 

> c 4 — 

1 


0 


0 


0 


form a basis for the column space of R. 
















EXAMPLE 6 Basis for a Row Space by Row Reduction 

Find a basis for the row space of the matrix 

' 1 -3 4 -2 5 4' 

, = 2-6 9-1 8 2 

2-6 9-1 9 7 

-1 3-4 2 -5 -4 

Since elementary row operations do not change the row space of a matrix, we can find a basis 
for the row space of A by finding a basis for the row space of any row echelon form of A. Reducing A to 
row echelon form, we obtain (verify) 

"1 -3 4 -2 5 4' 

„ 0 0 1 3-2-6 

0 0 0 0 1 5 

0 0 0 0 0 0 

By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and hence form a 
basis for the row space of A. These basis vectors are 

ri = [1 -3 4 -2 5 4] 

r 2 = [0 0 1 3-2 -6] 

r 3 = [0 0 0 0 1 5] 


The problem of finding a basis for the column space of a matrix A in Example 6 is complicated by the fact that an 
elementary row operation can alter its column space. However, the good news is that elementary row operations do not 
alter dependence relationships among the column vectors. To make this more precise, suppose that wi , w 2 , ..., are 
linearly dependent column vectors of A, so there are scalars cj, c 2 , c% that are not all zero and such that 

cjwi +c 2 w 2 -K.. + c/fWfc = 0 (4) 

If we perform an elementary row operation on A, then these vectors will be changed into new column vectors 
w [, W2 ,... At first glance it would seem possible that the transformed vectors might be linearly independent. 
However, this is not so, since it can be proved that these new column vectors will be linear dependent and, in fact, 
related by an equation 

cpwi + + = 0 

that has exactly the same coefficients as 4. It follows from the fact that elementary row operations are reversible that 
they also preserve linear independence among column vectors (why?). The following theorem summarizes all of these 
results. 


THEOREM 4.7.6 

If A and B are row equivalent matrices, then: 

(a) A given set of column vectors of A is linearly independent if and only if the corresponding column vectors 
of B are linearly independent. 






(b) A given set of column vectors of A forms a basis for the column space of A if and only if the corresponding 
column vectors of B form a basis for the column space of B. 


EXAMPLE 7 Basis for a Column Space by Row Reduction 

Find a basis for the column space of the matrix 

' 1 -3 4 -2 5 4' 

2-69-182 
2-69-197 
-1 3 —4 2-5-4 

We observed in Example 6 that the matrix 

"l —3 4 —2 5 4" 

0 0 1 3-2-6 

0 0 0 0 1 5 

0 0 0 0 0 0 

is a row echelon form of A. Keeping in mind that A and R can have different column spaces, we cannot 
find a basis for the column space of A directly from the column vectors of R. However, it follows from 
Theorem 4.7 .6b that if we can find a set of column vectors of R that forms a basis for the column space of 
R , then the corresponding column vectors of A will form a basis for the column space of A. 

Since the first, third, and fifth columns of R contain the leading 1 's of the row vectors, the vectors 


T 


V 


5" 

0 

J 

i 

J 

-2 

0 

’ c 3 “ 

0 

’ c 5 ~ 

1 

0 


0 


0 


form a basis for the column space of R. Thus, the corresponding column vectors of A, which are 


f 


4" 


5' 

2 


9 


8 

2 

. c 3 = 

9 

> c 5 = 

9 

-1 


-4 


-5 


form a basis for the column space of A. 


Up to now we have focused on methods for finding bases associated with matrices. Those methods can readily be 
adapted to the more general problem of finding a basis for the space spanned by a set of vectors in R n . 

EXAMPLE 8 Basis for a Vector Space Using Row Operations 

Find a basis for the subspace of R-* spanned by the vectors 

V1 = (1. -2,0,0,3), v 2 = (2, -5, -3. -2,6), 

v 3 = (0,5,15,10,0), v 4 = (2,6,18,8,6) 


















The space spanned by these vectors is the row space of the matrix 

1 -2 0 0 3 ~ 

2 -5 -3 -2 6 

0 5 15 10 0 

2 6 18 8 6 

Reducing this matrix to row echelon form, we obtain 

'1 -2 0 0 3' 

0 13 2 0 

0 0 110 
0 0 0 0 0 

The nonzero row vectors in this matrix are 

wq = (l, -2, 0,0,3), w 2 = (0,1,3, 2,0), w 3 = (0, 0, 1, 1, 0) 

These vectors form a basis for the row space and consequently form a basis for the subspace of ^ 
spanned by vj, v 3 , v 3 , and V 4 . 


Bases Formed from Row and Column Vectors of a Matrix 


In all of the examples we have considered thus far we have looked for bases in which no restrictions were imposed on 
the individual vectors in the basis. We now want to focus on the problem of finding a basis for the row space of a matrix 
A consisting entirely of row vectors from A and a basis for the column space of A consisting entirely of column vectors 
of A. 


Looking back on our earlier work, we see that the procedure followed in Example 7 did, in fact, produce a basis for the 
column space of A consisting of column vectors of A, whereas the procedure used in Example 6 produced a basis for the 
row space of A, but that basis did not consist of row vectors of A. The following example shows how to adapt the 
procedure from Example 7 to find a basis for the row space of a matrix that is formed from its row vectors. 


EXAMPLE 9 Basis for the Row Space of a Matrix 

Find a basis for the row space of 

'1 -2 0 0 3" 



2 6 18 86 


consisting entirely of row vectors from A. 


We will transpose A, thereby converting the row space of A into the column space of A ^ then 
we will use the method of Example 7 to find a basis for the column space of and then we will 
transpose again to convert column vectors back to row vectors. Transposing A yields 








A T = 


2 0 2 
-5 5 6 

-3 15 18 
8 
6 


-2 10 

6 0 


Reducing this matrix to row echelon form yields 

1 2 


0 2 
0 1 -5 -10 

0 0 0 1 

0 0 0 0 

0 0 0 0 

The first, second, and fourth columns contain the leading 1 's, so the corresponding column vectors in 
form a basis for the column space of A ^ these are 


ci = 


Transposing again and adjusting the notation appropriately yields the basis vectors 

ri = [l -2 0 0 3], r 2 = [2 -5 -3 -2 6], 

and 

r 4 = [2 6 18 8 6] 

for the row space of A. 


f 


2 " 


’ 2' 

-2 


-5 


6 

0 

. c 2 = 

-3 

§ 

a* 

n 

II 

18 

0 


-2 


8 

3 


6 


6 


Next, we will give an example that adapts the methods we have developed above to solve the following general 
problem in R n : 

r n 


PROBLEM 

Given a set of vectors S’ = (vj, v 2 ,.... v^} in R”, find a subset of these vectors that forms a basis for span (S), 
and express those vectors that are not in that basis as a linear combination of the basis vectors. 

J 


EXAMPLE 10 Basis and Linear Combinations 

Find a subset of the vectors 

vi = (1, - 2, 0, 3), v 2 = (2, - 5 , - 3 , 6), 
v 3 =(0. 1,3,0), v 4 =(2, -1,4, -7), v 5 =(5, -8,1,2) 

that forms a basis for the space spanned by these vectors. 

Express each vector not in the basis as a linear combination of the basis vectors. 


Solution 












We begin by constructing a matrix that has vi, V 2 , 

1 2 0 
-2 -5 1 
0-3 3 
3 6 0 


vj as its column vectors: 

2 5' 

-1 -8 
4 1 

-7 2 


T T T T T 

VI V2 V3 V4 V5 


( 5 ) 


The first part of our problem can be solved by finding a basis for the column space of this matrix. 
Reducing the matrix to reduced row echelon form and denoting the column vectors of the resulting 
matrix by wj, W2, W 3 , W 4 , and wj yields 


1 0 2 0 1 

01-101 
0 0 0 1 1 

0 0 0 0 0 

T r T T T 

wi W2 W3 W4 


( 6 ) 


The leading l's occur in columns 1, 2, and 4, so by Theorem 4.7.5, 

{w 1 .w 2 .w 4 } 

is a basis for the column space of 6 , and consequently, 

{v1.v2.v4} 

is a basis for the column space of 5. 


We will start by expressing W 3 and as linear combinations of the basis vectors wq, wq?, W 4 . The 
simplest way of doing this is to express W 3 and in terms of basis vectors with smaller subscripts. 
Accordingly, we will express W 3 as a linear combination of and W 2 , and we will express as a 
linear combination of wq, W 2 , and W 4 . By inspection of 6 , these linear combinations are 

W3 = 2 wi “W2 
W5 = wi 4 - W2 4 - W4 


We call these the dependency equations. The corresponding relationships in 5 are 

V3 = 2vi - V2 
V5 = vi + V2 + V4 


The following is a summary of the steps that we followed in our last example to solve the problem posed above. 

Basis for Span(S) 

Step 1. Form the matrix A having vectors in S = { vi , V 2 ,.. v^ } as column vectors. 

Step 2. Reduce the matrix A to reduced row echelon form R. 

Step 3. Denote the column vectors of R by wq, W 2 ,..w^. 

Step 4. Identify the columns of R that contain the leading 1 's. The corresponding column vectors of A form a basis for 
span (5). 

This completes the first part of the problem. 

Step 5. Obtain a set of dependency equations by expressing each column vector of R that does not contain a leading 1 
as a linear combination of preceding column vectors that do contain leading 1 's. 






Step 6. Replace the column vectors of R that appear in the dependency equations by the corresponding column vectors 
of A. 

This completes the second part of the problem. 


Concept Review 

• Row vectors 
Column vectors 
Row space 
Column space 
Null space 
General solution 
Particular solution 

Relationships among linear systems and row spaces, column spaces, and null spaces 
Relationships among the row space, column space, and null space of a matrix 
Dependency equations 

Skills 

Determine whether a given vector is in the column space of a matrix; if it is, express it as a linear 
combination of the column vectors of the matrix. 

Find a basis for the null space of a matrix. 

Find a basis for the row space of a matrix. 

Find a basis for the column space of a matrix. 

Find a basis for the span of a set of vectors in R n . 


Exercise Set 4.7 

1. List the row vectors and column vectors of the matrix 

' 2-10 1 

3 5 7 -1 

14 2 7 

Answer: 

ri = (2, -1,0,1), r 2 = (3, 5, 7, - 1), r 3 = (1.4. 2,7); 


Cl = 

CM OO 

i_ 

. c 2 = 

■-r 

5 

. c 3 = 

1 

,° ^ 

, C 4 = 

f 

-1 


1 


4 


2 


7 


2. Express the product as a linear combination of the column vectors of A. 













3. Determine whether b is in the column space of A, and if so, express b as a linear combination of the column vectors 
ofJ. 



(b) ri i 2i r-r 

A= 10 1 , b = 0 

2 1 3j [2 



(b) b is not in the column space of A. 




4. Suppose that x\ = — 1, *2 = 2, *3 = 4, *4 = — 3 is a solution of a nonhomogeneous linear system j^x. = b and that 
the solution set of the homogeneous system = 0 is given by the formulas 

x\ = — 3 r + 4 s, X2 = r — s, *3 = /% *4 = s 

(a) Find a vector form of the general solution of J}x — 0 . 

(b) Find a vector form of the general solution of — b- 

5. In parts (a)-(d), find the vector form of the general solution of the given linear system = b; then use that result to 
find the vector form of the general solution of Ax. = 0 - 

(a) xi-3*2=1 

2 xi — 6 x 2 = 2 

(b) xi+X2 + 2x3= 5 

xi + X 3 = — 2 

2 xj +X 2 + 3x3 = 3 

(c) xi—2x2+ X3 + 2x4 =—1 
2 xi — 4x2 + 2 x 3 + 4x4 = — 2 

— xi+ 2 x 2 — X 3 — 2 x 4 = 1 

3xi — 6 x 2 + 3 x 3 + 6 x 4 = — 3 

(d) xi+2x2 —3x3+ X 4 = 4 

— 2 xi+ X 2 + 2 X 3 + X 4 = — 1 

— xi+ 3x2— X 3 + 2x4= 3 

4xi — 7x2 — 5x4 = — 5 


Answer: 



(b) [—2] r-11 r-r 

7 +t -1 ; t -1 

°J lj [ 1 



(d) [6] [l] l] [l] 1 

5 5 5 5 5 

7 4 3 4 3 

^ +s ^ +^ 5,^5 + £ ^ 

0 1 0 1 0 

oj [oj 1 L°J 1 

6 . Find a basis for the null space of A. 

(a) [1-1 3- 

A= 5 -4 -4 

7-6 2 

(b) [2 0 -1' 

j4= 4 0 —2 

0 0 0 



(c) 1 4 5 2" 

A= 2 13 0 

-13 2 2 

(d) [ 1 4 5 6 9 

3-214-1 
-1 0 -1 -2 -1 

2 3 5 7 8 

(e) [ 1 -3 2 2 f 

0360-3 
A= 2-3-2 4 4 

3 -6 0 6 5 

-2 9 2 -4 -5 

7. In each part, a matrix in row echelon form is given. By inspection, find bases for the row and column spaces of A. 

(a) 102 

0 0 1 

0 0 0 

(b) [1 -3 0 O' 

0 10 0 
0 0 0 0 

0 0 0 0 

(c) [l 2 4 5' 

0 1-3 0 

0 0 1-3 

0 0 0 1 
0 0 0 0 

(d) [1 2 -1 5' 

0 14 3 

0 0 1-7 

0 0 0 1 

Answer: 

(a) [l 

r l = [1 0 2], r 2 = [0 0 1], cj= 0 , i 

0 

(b) 

ri = [l -3 0 0], r 2 = [0 100], cj = 


(c) n = [1 2 4 5], r 2 = [0 1 -30], r 3 = [00 1 - 3], r 4 = [0 0 0 1], 





(d) ri = [1 2 — 1 5], r 2 = [0 14 3], r 3 = [0 0 1 -7], r 4 = [0 0 0 1] 


Y 


2 


'-l' 


5' 

0 


1 


4 


3 

0 

. c 2 = 

0 

, C 3 = 

1 

> C 4 = 

-7 

0 


0 


0 


1 


8. For the matrices in Exercise 6, find a basis for the row space of A by reducing the matrix to row echelon form. 

9. By inspection, find a basis for the row space and a basis for the column space of each matrix. 


(a) 

"1 

0 2' 




0 

0 1 




0 

0 0 



(b) 

1 

-3 

0 

o' 


0 

1 

0 

0 


0 

0 

0 

0 


0 

0 

0 

0 

(c) 

1 

2 

4 

5 


0 

1 - 

3 

0 


0 

0 

1 

-3 


0 

0 

0 

1 


0 

0 

0 

0 

(d) 

'l 

2 - 

1 

5 


0 

1 ■ 

4 

3 


0 

0 

1 

-7 


0 

0 

0 

1 


Answer: 


(a) 

'l' 


~2 

ri = [ 1 0 2]; r 2 = [0 0 1 ]; cj = 

0 

; c 2 = 

1 


0 


0 


(b) 

1 


-3 

ri = [1 -3 0 0]; r 2 = [0 1 0 0]; ci = 

0 

0 

; c 2 = 

1 

0 


0 


0 

(c) ri = [1 2 4 5]; r 2 = [0 1 -3 0], r 3 = 

[0 

0 1 ■ 

-3]; 


Y 


'2' 


4 


5' 

0 


1 


-3 


0 

0 

; c 2 = 

0 

; c 3 = 

1 

; c 4 = 

-3 

0 


0 


0 


1 

0 


0 


0 


0 


(d) ri = [l 2 -1 5]; r 2 = [0 1 4 3];r 3 =[0 0 1 -7]; 



r 4 = [0 0 0 1 ]; ci = 


T 


' 2 ' 


'-f 


5' 

0 


l 


4 


3 

0 

; c 2 = 

0 

; c 3 = 

1 

; c 4 = 

-7 

0 


0 


0 


1 


10. For the matrices in Exercise 6 , find a basis for the row space of A consisting entirely of row vectors of A. 

11. Find a basis for the subspace of spanned by the given vectors. 

(a) (1.1. -4, -3), (2, 0,2, -2), (2, -1,3,2) 

(b) (-1.1. -2,0), (3, 3, 6,0), (9, 0,0, 3) 

(c) (1. 1. 0, 0), (0, 0, 1, 1), (- 2, 0, 2, 2), (0, - 3, 0, 3) 


Answer: 

(a) ( 1 , 1 , -4-3), (0,1, -5, -2), Jo, 0, 1, 

( b ) (1, -1,2,0), (0, 1,0, 0), Jo. 0.1. -•!) 

(c) (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 

12. Find a subset of the vectors that forms a basis for the space spanned by the vectors; then express each vector that is 
not in the basis as a linear combination of the basis vectors. 

(a) V1 = (1.0. 1.1). v 2 =(-3.3.7.1). v 3 = (- 1, 3, 9, 3), v 4 =(-5,3,5, -1) 

(b) vi = (l, -2,0,3), v 2 = (2, -4,0,6), v 3 = (-l, 1,2,0), v 4 =(0, -1,2,3) 

(c) vi = (l, -1,5,2), v 2 = (-2,3, 1,0), v 3 = (4, -5,9,4), v 4 =(0.4.2. -3). v 5 = (-7. 18. 2. - 8 ) 

13. Prove that the row vectors of an n x n invertible matrix A form a basis for R n . 

14. Construct a matrix whose null space consists of all linear combinations of the vectors 



f 


2 ' 

vi = 

-1 

3 

and V 2 = 

0 

-2 


2 


4 


(a) Let 


A = 


0 1 0 
1 0 0 
0 0 0 


Show that relative to an xyz-coordinate system in 3-space the null space of A consists of all points on the z-axis 
and that the column space consists of all points in the xy-plane (see the accompanying figure). 

(b) Find a 3 x 3 matrix whose null space is the x-axis and whose column space is theyz-plane. 


Null space of A 


x 


Column space 
of/1 


y 

















Figure Ex-15 


Answer: 


(b) 


0 0 0 
0 1 0 
0 0 1 


16. Find a 3 x 3 matrix whose null space is 

(a) a point. 

(b) a line. 

(c) a plane. 

(a) Find all 2 x 2 matrices whose null space is the line 3x — 5y = 0 . 

(b) Sketch the null spaces of the following matrices: 


A = 

1 4 
_° 5_ 

, B = 

1 0 
o 5y 

C = 

'6 2' 
3 1_ 

, D = 

i-1 

o o 

o o 

1 _1 


Answer: 


(a) 


3a 

3b 


-5a 

-5b 


for all real numbers a , b not both 0. 


(b) Since A and B are invertible, their null spaces are the origin. The null space of C is the line 3x | y = 0- The null 
space of D is the entire xy-plane. 


18. The equation x\ + xj + *3 = 1 can be viewed as a linear system of one equation in three unknowns. Express its 
general solution as a particular solution plus the general solution of the corresponding homogeneous system. 
[Suggestion: Write the vectors in column form.] 

19. Suppose that A and B are ^ x n matrices and A is invertible.Invent and prove a theorem that describes how the row 
spaces of AB and B are related. 

True-False Exercises 


In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) The span of v\,\ n is the column space of the matrix whose column vectors are v\, v M . 

Answer: 

True 

(b) The column space of a matrix A is the set of solutions of Ax = b- 
Answer: 

False 

(c) If R is the reduced row echelon form of A, then those column vectors of R that contain the leading l's form a basis for 
the column space of A. 














Answer: 


False 

(d) The set of nonzero row vectors of a matrix A is a basis for the row space of A. 

Answer: 

False 

(e) If A and B are n x n matrices that have the same row space, then A and B have the same column space. 

Answer: 

False 

(f) If E is an m x m elementary matrix and A is an m x n matrix, then the null space of E A is the same as the null space 
of ,4. 

Answer: 

True 

(g) If E is an m x m elementary matrix and A is an m x n matrix, then the row space of E A is the same as the row space 
of A. 

Answer: 

True 

(h) If E is an m x m elementary matrix and A is an m x n matrix, then the column space of E A is the same as the column 
space of ,4. 

Answer: 

False 

(i) The system Ax = b is inconsistent if and only if b is not in the column space of A. 

Answer: 

True 

(j) There is an invertible matrix A and a singular matrix B such that the row spaces of A and B are the same. 

Answer: 

False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



4.8 Rank, Nullity, and the Fundamental Matrix 
Spaces 

In the last section we investigated relationships between a system of linear equations and the row space, column 
space, and null space of its coefficient matrix. In this section we will be concerned with the dimensions of those 
spaces. The results weobtain will provide a deeper insight into the relationship between a linear system and its 
coefficient matrix. 


Row and Column Spaces Have Equal Dimensions 

In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the matrix 

' 1 -3 4 -2 5 4" 

= 2—6 9—1 8 2 

2-69-197 
-1 3-4 2-5-4 

both have three basis vectors and hence are both three-dimensional. The fact that these spaces have the same 
dimension is not accidental, but rather a consequence of the following theorem. 


THEOREM 4.8.1 

The row space and column space of a matrix A have the same dimension. 


Let R be any row echelon form of A. It follows from Theorem 4.7.4 and Theorem 4.7.6 b that 

dim (row space of A) = dim (row space of R ) 
dim(column space of A) = dim(column space of R ) 

so it suffices to show that the row and column spaces of R have the same dimension. But the dimension of the row 
space of R is the number of nonzero rows, and by Theorem 4.7.5 the dimension of the column space of R is the 
number of leading l's. Since these two numbers are the same, the row and column space have the same dimension. 


Rank and Nullity 

The dimensions of the row space, column space, and null space of a matrix are such important numbers that there is 
some notation and terminology associated with them. 




DEFINITION 1 


The common dimension of the row space and column space of a matrix A is called the rank of A and is 
denoted by rank(^); the dimension of the null space of A is called the nullity of A and is denoted by 
nullity (A). 


J 


The proof of Theorem 4.8.1 shows that the rank 
of A can be interpreted as the number of leading 
1 's in any row echelon form of A. 


EXAMPLE 1 Rank and Nullity of a 4 x 6 Matrix 


Find the ra nk and nullity of the matrix 

"-1 20 4 5-3 

3 -7 2 0 1 4 

2 -5 2 4 6 1 

4-9 2-4-4 7 


The reduced row echelon form of A is 


1 0 -4 -28 -37 13 

0 1 -2 -12 -16 5 

0 0 0 0 0 0 

0 0 0 0 0 0 


( 1 ) 


(verify). Since this matrix has two leading l's, its row and column spaces are two-dimensional and 

rank (^4) = 2. To find the nullity of A, we must find the dimension of the solution space of the linear 

system Ax = 0- This system can be solved by reducing its augmented matrix to reduced row echelon 
form. The resulting matrix will be identical to 1, except that it will have an additional last column of 
zeros, and hence the corresponding system of equations will be 

x\ — 4 x 3 “ 28 x 4 “ 37 x 5 + 13x6 = 0 

X2 — 2x3 — 12x4 — 16x5 + 5 x 6 = 0 

Solving these equations for the leading variables yields 


*1 = 4x 3 

+ 

28 x 4 + 37 X 5 — 

13x6 

*2 = 2 x 3 + 

from which we obtain the general solution 

12 x 4 + 16*5 — 

5x6 

*i 

= 

4 r + 28s + 37 1 —13 u 


*2 

= 

2r + 12s + 1 6t — 5u 


*3 

= 

r 


x 4 

= 

s 



= 

t 


*6 

= 

u 







or in column vector form 


’*f 


4 


28 


37 


-13 

*2 


2 


12 


16 


-5 

*3 


1 

+ s 

0 

+ t 

0 

+ u 

0 

*4 

= r 

0 

1 

0 

0 

X5 


0 


0 


1 


0 

x 6 


0 


0 


0 


1 


Because the four vectors on the right side of 3 form a basis for the solution space, nullity(^) = 4. 


( 3 ) 


EXAMPLE 2 Maximum Value for Rank 

What is the maximum possible rank of an m x n matrix A that is not square? 

Since the row vectors of A lie in R* and the column vectors in R m , the row space of A is 
at most ^-dimensional and the column space is at most m-dimensional. Since the rank of A is the 
common dimension of its row and column space, it follows that the rank is at most the smaller of m 
and n. We denote this by writing 

rank(^4) < min(ra, n) 

in which min ( m f n) is the minimum of m and n. 


The following theorem establishes an important relationship between the rank and nullity of a matrix. 


Dimension Theorem for Matrices 

If A is a matrix with n columns, then 


rank (-4) + nullity (A) = n 


(4) 


Since A has n columns, the homogeneous linear system Ax = 0 has n unknowns (variables). These fall into 
two distinct categories: the leading variables and the free variables. Thus, 

number of leading 
variables 

But the number of leading variables is the same as the number of leading l's in the reduced row echelon form of .4, 
which is the rank of A ; and the number of free variables is the same as the number of parameters in the general 
solution of Ax = 0? which is the nullity of A. This yields Formula 4. 


number of free 
variables 


= n 

















EXAMPLE 3 The Sum of Rank and Nullity 

The matrix 

'-1 2 0 4 5 —3 

3 -7 2 0 1 4 

2-52461 
4—9 2—4 —4 7 

has 6 columns, so 

rank (^4) 4 nullity (-4) = 6 

This is consistent with Example 1, where we showed that 

rank (^4) = 2 and nullity (^4) = 4 


The following theorem, which summarizes results already obtained, interprets rank and nullity in the context of a 
homogeneous linear system. 


THEOREM 4.8.3 

If A is an m x n matrix, then 

(a) rank (.4) = the number of leading variables in the general solution of A x = 0. 

(ty nullity (2d) = the number of parameters in the general solution of A x = 0 

EXAMPLE 4 Number of Parameters in a General Solution 

Find the number of parameters in the general solution of Ax = 0 if -A is a 5 x 7 matrix of rank 3. 
From 4, 

nuUity(j4) = n — rank(-d) =7 — 3 = 4 

Thus there are four parameters. 


Equivalence Theorem 

In Theorem 2.3.8 we listed seven results that are equivalent to the invertibility of a square matrix A. We are now in 
a position to add eight more results to that list to produce a single theorem that summarizes most of the topics we 
have covered thus far. 




THEOREM 4.8.4 Equivalent Statements 


If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is l n . 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b- 

(f) Ax = b has exactly one solution for every ^ x 1 matrix b- 

(g) det(^) * 0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span R n . 

(k) The row vectors of A span R n . 

(l) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n. 

(o) A has nullity 0. 

The equivalence of {h) through (m) follows from Theorem 4.5.4 (we omit the details). To complete the 
proof we will show that (6), («), and ( o ) are equivalent by proving the chain of implications 
(b) => (p) => (») => (b). 

(°) = 0 has only the trivial solution, then there are no parameters in that solution, so nullity (^4) = 0 

by Theorem 4.8.3 b. 

ip) => (») Theorem 4.8.2. 

(«)=»-(*) If A has rank n , then Theorem 4.8.3a implies that there are n leading variables (hence no free variables) 
in the general solution of = 0- This leaves the trivial solution as the only possibility. 


Overdetermined and Underdetermined Systems 

In many applications the equations in a linear system correspond to physical constraints or conditions that must be 
satisfied. In general, the most desirable systems are those that have the same number of constraints as unknowns, 
since such systems often have a unique solution. Unfortunately, it is not always possible to match the number of 
constraints and unknowns, so researchers are often faced with linear systems that have more constraints than 
unknowns, called overdetermined systems , or with fewer constraints than unknowns, called underdetermined 
systems. The following two theorems will help us to analyze both overdetermined and underdetermined systems. 


In engineering and other applications, the 
occurrence of an overdetermined or 
underdetermined linear system often signals that 
one or more variables were omitted in formulating 
the problem or that extraneous variables were 
included. This often leads to some kind of 
undesirable physical result. 


THEOREM 4.8.5 

Ifi4x = b is a consistent linear system of m equations in n unknowns, and if A has rank r, then the general 
solution of the system contains n — r parameters. 


It follows from Theorem 4.7.2 that the number of parameters is equal to the nullity of A, which, by 
Theorem 4.8.2, is n _ r . 


THEOREM 4.8.6 

Letdbean wx « matrix. 

(a) (Overdetermined Case) If m>n, then the linear system Ax = b is inconsistent for at least one vector 
bin/?". 

(b) (Underdetermined Case) If m<n-> then for each vector b in R m the linear system Ax = b is either 
inconsistent or has infinitely many solutions. 


Assume that m , in which case the column vectors of A cannot span R m (fewer vectors than the 
dimension of R m ). Thus, there is at least one vector b in R m that is not in the column space of A, and for that b the 
system Ax = b is inconsistent by Theorem 4.7.1. 

Assume that m For each vector b in /?" there are two possibilities: either the system Ax = b is 
consistent or it is inconsistent. If it is inconsistent, then the proof is complete. If it is consistent, then Theorem 4.8.5 
implies that the general solution has n — r parameters, where r = rank(A). But rank (A) is the smaller of m and n, 
so 


» — r = n^m>0 

This means that the general solution has at least one parameter and hence there are infinitely many solutions. 


EXAMPLE 5 Overdetermined and Underdetermined Systems 


What can you say about the solutions of an overdetermined system Ax = b of 7 equations in 5 
unknowns in which A has rank — 4 ? 

What can you say about the solutions of an underdetermined system Ax = b of 5 equations in 7 
unknowns in which A has rank r = 4 ? 

Solution 

The system is consistent for some vector b in R and for any such b the number of parameters in 
the general solution is^_^= 5—4 = 1 • 

(b) The system may be consistent or inconsistent, but if it is consistent for the vector b in then the 
general solution has ^ — ^ = 7—4 = 3 parameters. 

EXAMPLE 6 An Overdetermined System 

The linear system 


*1 

— 

2 x 2 

= 

*1 

— 

x 2 

= h 

*1 

+ 

*2 

= h 

*1 

+ 

2 x 2 

= 64 

*1 

+ 

3*2 

= h 


is overdetermined, so it cannot be consistent for all possible values of b\, 62 , ^ 3 ? and 63 . Exact 
conditions under which the system is consistent can be obtained by solving the linear system by Gauss- 
Jordan elimination. We leave it for you to show that the augmented matrix is row equivalent to 


1 

0 


2*2 

— 

*1 

0 

1 


h 

— 


0 

0 

h - 

- 3*2 

+ 

2*i 

0 

0 

b 4 - 

- 4*2 

+ 

3*i 

0 

0 

h - 

- 5*2 

+ 

4*i 


Thus, the system is consistent if and only if 6 62 > an d 63 satisfy the conditions 

2 b\ — 3&2 4 s 63 = 0 

36 1 — 46 2 +64 =0 

46 1 — 562 + 65 = 0 

Solving this homogeneous linear system yields 

61 = 5r — 4s, 62 = 4r — 3s, 63 = 2r — s, 64 = r, b$ = s 

where r and s are arbitrary. 

The coefficient matrix for the linear system in the last example has n = 2 columns, and it has rank p = 
because there are two nonzero rows in its reduced row echelon form. This implies that when the system is 
consistent its general solution will contain n - r =Q parameters; that is, the solution will be unique. With a 
moment's thought, you should be able to see that this is so from 5. 




The Fundamental Spaces of a Matrix 

There are six important vector spaces associated with a matrix A and its transpose A 

T 

row space of A row space of A 

T 

column space of A column space of A 

T 

null space of A null space of A 

However, transposing a matrix converts row vectors into column vectors and conversely, so except for a difference 
in notation, the row space of ^ ^ is the same as the column space of A, and the column space of A J is the same as 
the row space of ,4. Thus, of the six spaces listed above, only the following four are distinct: 

row space of A column space of A 

T 

null space of A null space of A 

If A is an ^ x n matrix, then the row space and 
null space of A are subspaces of/?”, and the 
column space of A and the null space of A J are 
subspaces of R m . 

These are called the fundamental spaces of a matrix A. We will conclude this section by discussing how these four 
subspaces are related. 

Let us focus for a moment on the matrix A Since the row space and column space of a matrix have the same 
dimension, and since transposing a matrix converts its columns to rows and its rows to columns, the following 
result should not be surprising. 


THEOREM 4.8.7 

If A is any matrix, then rank ^4 J = rank ^4 ^ J. 


Proof 


rank j^4 j = dim (row space 


of A) = dim (column space of A 


r ) 



This result has some important implications. For example, if A is an m x n matrix, then applying Formula 4 to the 
matrix A T an d using the fact that this matrix has m columns yields 

rank^4^J + nullity = m 


which, by virtue of Theorem 4.8.7, can be rewritten as 


rank 4- nullity = m (6) 

This alternative form of Formula 4 in Theorem 4.8.2 makes it possible to express the dimensions of all four 
fundamental spaces in terms of the size and rank of A. Specifically, if rank(^4) = r, then 

dim[row(^4)] =r dim[col(^4)] =r 

dim [null(^4)] =n — r dimj^null^^Jj = m — r ^ 

The four formulas in 7 provide an algebraic relationship between the size of a matrix and the dimensions of its 
fundamental spaces. Our next objective is to find a geometric relationship between the fundamental spaces 
themselves. For this purpose recall from Theorem 3.4.3 that if A is an ^ x n matrix, then the null space of A 
consists of those vectors that are orthogonal to each of the row vectors of A. To develop that idea in more detail, we 
make the following definition. 


DEFINITION 2 

If IF is a subspace of/?”, then the set of all vectors in /?” that are orthogonal to every vector in W is called 
the orthogonal complement of W and is denoted by the symbol W 1 • 


J 


The following theorem lists three basic properties of orthogonal complements. We will omit the formal proof 
because a more general version of this theorem will be given later in the text. 


THEOREM 4.8.8 

If IF is a subspace of/?”, then: 

(a) W 1 is a subspace of /?”. 

(b) The only vector common to W and W 1 is 0. 

(c) The orthogonal complement of W 1 is W. 


EXAMPLE 7 Orthogonal Complements 

In g^ the orthogonal complement of a line W through the origin is the line through the origin that is 
perpendicular to W (Figure 4.8.1a); and in g- the orthogonal complement of a plane W through the 
origin is the line through the origin that is perpendicular to that plane (Figure 4.8.16). 




X 




W L 


(«) 

Figure 4.8.1 


(*) 


Explain why {0} and R n are orthogonal 
complements. 


A Geometric Link Between the Fundamental Spaces 

The following theorem provides a geometric link between the fundamental spaces of a matrix. Part (a) is essentially 
a restatement of Theorem 3.4.3 in the language of orthogonal complements, and part (6), whose proof is left as an 
exercise, follows from part (a). The essential idea of the theorem is illustrated in Figure 4.8.2. 


THEOREM 4.8.9 

If A is an m x n matrix, then: 

(a) The null space of A and the row space of A are orthogonal complements in R n . 

(b) The null space of A ^ and the column space of A are orthogonal complements in R™. 




More on the Equivalence Theorem 









As our final result in this section, we will add two more statements to Theorem 4.8.4. We leave the proof that those 
statements are equivalent to the rest as an exercise. 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is l n . 

(d) A is expressible as a product of elementary matrices. 

(e) Ax. = b is consistent for every ^ x 1 matrix b- 

(f) Ax = b has exactly one solution for every « x 1 matrix b- 

(g) det(^) * 0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span R n . 

(k) The row vectors of A span R n . 

(l) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n- 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of A is R n . 

(q) The orthogonal complement of the row space of ,4 is {0} . 


Applications of Rank 

The advent of the Internet has stimulated research on finding efficient methods for transmitting large amounts of 
digital data over communications lines with limited bandwidths. Digital data are commonly stored in matrix form, 
and many techniques for improving transmission speed use the rank of a matrix in some way. Rank plays a role 
because it measures the “redundancy” in a matrix in the sense that if A is an ^ x n matrix of rank k , then ^ — k of 
the column vectors and w — £ of the row vectors can be expressed in terms of k linearly independent column or 
row vectors. The essential idea in many data compression schemes is to approximate the original data set by a data 
set with smaller rank that conveys nearly the same information, then eliminate redundant vectors in the 
approximating set to speed up the transmission time. 


Concept Review 

Rank 

Nullity 

Dimension Theorem 
Overdetermined system 
Underdetermined system 
Fundamental spaces of a matrix 
Relationships among the fundamental spaces 
Orthogonal complement 

Equivalent characterizations of invertible matrices 

Skills 

Find the rank and nullity of a matrix. 

Find the dimension of the row space of a matrix. 


Exercise Set 4.8 

Verify that rank (a J = rank (a 1 J. 


A = 


12 4 0 
-3152 
-2 3 9 2 


Answer: 

Rank(^)=Rank( J 4 r ) = 2 

2. Find the rank and nullity of the matrix; then verify that the values obtained satisfy Formula 4 in the Dimension 
Theorem. 


(a) 

1 -1 

3 

A = 

5 -4 - 

4 


7 -6 

2 

(b) 

"2 0 -1" 


A = 

4 0-2 



l 

O 

o 

o 

1_ 


(c) 

14 5 

2' 

A = 

2 1 3 

0 


-13 2 

2 

(d) 

1 4 

5 



2 3 5 7 8 













(e) [ 1 -3 2 2 1" 

0360-3 
A= 2-3-2 4 4 

3 -6 0 6 5 

—2 9 2 -4 -5 

3. In each part of Exercise 2, use the results obtained to find the number of leading variables and the number of 
parameters in the solution of Ax = 0 without solving the system. 

Answer: 

(a) 2; 1 

(b) 1; 2 

(c) 2; 2 

(d) 2; 3 

(e) 3; 2 

4 . In each part, use the information in the table to find the dimension of the row space of A, column space of A, 
null space of A, and null space ofyJ J . 



(a) 

(b) 

(c) 

(d) 

(e) 

(f) 

(g) 

Size of A 

3x3 

3x3 

3x3 

5x9 

9x5 

4x4 

6x2 

Rank(^) 

3 

2 

1 

2 

2 

0 

2 


5. In each part, find the largest possible value for the rank of A and the smallest possible value for the nullity of A. 

(a) A is4x4 

(b) A is 3 x 5 

(c) A is 5 x 3 

Answer: 

( a ) Rank = 4, nullity = 0 

(b) Rank = 3, nullity = 2 

( c ) Rank = 3, nullity = 0 

6. If A is an m x n matrix, what is the largest possible value for its rank and the smallest possible value for its 
nullity? 

7. In each part, use the information in the table to determine whether the linear system Ax. = b is consistent. If so, 
state the number of parameters in its general solution. 

| (a) I (b) I (c) I (d) I (e) I (f) I (g) 


Size of A 3x3 3x3 3x3 5x9 5x9 4x4 6x2 

Rank (A) 3 2 1 2 2 0 2 

Rank [A |b] 3 3 1 2 3 0 2 
















Answer: 


(a) Yes, 0 

(b) No 

(c) Yes, 2 

(d) Yes, 7 

(e) No 

(f) Yes, 4 

(g) Yes, 0 

8. For each of the matrices in Exercise 7, find the nullity of A, and determine the number of parameters in the 
general solution of the homogeneous linear system Ax = 0- 

9. What conditions must be satisfied by b j, £> 2 , ar| d for the overdetermined linear system 

x\ — 3x2 = b[ 
xi - 2x2 = ^2 

*1 +*2 = 63 

*1 -4x2 =^4 
xi + 5x2 =^:5 

to be consistent? 


Answer: 


b\ = r, b 2 = s, 63 = 4s — 3r, b^ = 2r- 

- s , b$ - 

= 8s- 

-7 r 



10 . Let 


r*n 

*12 

*13' 



A = 

[*21 

*22 

*23 


Show that A has rank 2 if an d only if one or more 

of the determinants 


a n 

*12 

*n 

*13 

*12 

*13 

a 2\ 

*22 ’ 

*21 

*23 

’ *22 

*23 


is nonzero. 

11 . Suppose that A is a 3 x 3 matrix whose null space is a line through the origin in 3-space. Can the row or column 
space of A also be a line through the origin? Explain. 

Answer: 

No 

12 . Discuss how the rank of A varies with t. 

(a) [11/" 

A= 1 t 1 

t 1 1 

(b) t 3-1" 

A= 3 6-2 

-1 -3 t 



13 . Are there values of r and s for which 

'10 0 

0 r-2 2 

0 s— 1 r + 2 

0 0 3 

has rank 1? Has rank 2? If so, find those values. 

Answer: 

Rank is 2 if r = 2 and § = \; the rank is never 1. 

14 . Use the result in Exercise 10 to show that the set of points (*, y ? z ) in R 3 for which the matrix 

y z~ 

1 x y 

has rank 1 is the curve with parametric equations x = t> y = £"•> z = P- 

15 . Prove: If ^ 0, then A and kA have the same rank. 

(a) Give an example of a 3 x 3 matrix whose column space is a plane through the origin in 3-space. 

(b) What kind of geometric object is the null space of your matrix? 

(c) What kind of geometric object is the row space of your matrix? 

(a) If A is a 3 x 5 matrix, then the number of leading l's in the reduced row echelon form of A is at most 

_. Why? 

(b) If A is a 3 x 5 matrix, then the number of parameters in the general solution of Ax = 0 is at most 

_. Why? 

(c) If A is a 5 x 3 matrix, then the number of leading 1 's in the reduced row echelon form of A is at most 

_. Why? 

(d) If A is a 5 x 3 matrix, then the number of parameters in the general solution of Ax = 0 is at most 

_. Why? 

Answer: 

(a) 3 

(b) 5 

(c) 3 

(d) 3 


-*-*• (a) If A is a 3 x 5 matrix, then the rank of A is at most_. Why? 

(b) If A is a 3 x 5 matrix, then the nullity of A is at most_. Why? 

(c) If A is a 3 x 5 matrix, then the rank of A T is at most_. Why? 

(d) If A is a 3 x 5 matrix, then the nullity of A ^ is at most_. Why? 


19- Find matrices A and B for which rank(^) = rank (5), but rank (a* ji * rank ^5 2 


Answer: 



20 . Prove: If a matrix A is not square, then either the row vectors or the column vectors of A are linearly dependent. 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) Either the row vectors or the column vectors of a square matrix are linearly independent. 

Answer: 

False 

(b) A matrix with linearly independent row vectors and linearly independent column vectors is square. 

Answer: 

True 

(c) The nullity of a nonzero mxn matrix is at most m. 

Answer: 

False 

(d) Adding one additional column to a matrix increases its rank by one. 

Answer: 

False 

(e) The nullity of a square matrix with linearly dependent rows is at least one. 

Answer: 

True 

(f) If A is square and Ax = b is inconsistent for some vector b, then the nullity of A is zero. 

Answer: 

False 

(g) If a matrix A has more rows than columns, then the dimension of the row space is greater than the dimension of 
the column space. 

Answer: 

False 

(b) if rank (-4 1 J = rank then A is square. 

Answer: 

False 

(i) There is no 3 x 3 matrix whose row space and null space are both lines in 3 -space. 






Answer: 


True 

a) if Vis a. subspace of R n and W is a subspace of V, then W x is a subspace of V 
Answer: 

False 
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4-9 Matrix Transformations from R n to R m 

In this section we will study functions of the form w=F(x), where the independent variable x is a vector in R n and the 
dependent variable w is a vector in R m . We will concentrate on a special class of such functions called “matrix 
transformations.” Such transformations are fundamental in the study of linear algebra and have important applications 
in physics, engineering, social sciences, and various branches of mathematics. 


Functions and Transformations 

Recall that a function is a rule that associates with each element of a set A one and only one element in a set B. If f 
associates the element b with the element a , then we write 

b = f(a) 

and we say that b is the image of a under/or that / (a) is the value of f at a. The set A is called the domain of f and the 
set B the codomain of f (Figure 4.9.1). The subset of the codomain that consists of all images of points in the domain is 
called the range of f 


a 


b =/(<*) 


Domain Codomain 

A H 

Figure 4.9.1 

For many common functions the domain and codomain are sets of real numbers, but in this text we will be concerned 
with functions for which the domain and codomain are vector spaces. 


DEFINITION 1 

If V and W are vector spaces, and if/is a function with domain V and codomain W, then we say that/is a 
transformation from F to IF or that/ maps V to IF, which we denote by writing 

f\V—>W 

In the special case where V — W? the transformation is also called an operator on F. 


L J 

In this section we will be concerned exclusively with transformations from R }} to R m ; transformations of general vector 
spaces will be considered in a later section. To illustrate one way in which such transformations can arise, suppose that 
/1> f 2> •--> f m are real-valued functions of n variables, say 



M>! = 

f l(*l,*2> 

X M ) 


w 2 = 

f 2(*1> *2> 


to 

= 

fm(x i,x 2 . 

X M ) 


These m equations assign a unique point (wj, w 2 ,.. 

,%) 

to each point (xi, X2,. 

x n ) in R n and thus define a 


transformation from R” to R m . If we denote this transformation by T, then T R n —► R m and 

T(xx 2 ,x„) = (w lr w 2 . 


Matrix Transformations 


In the special case where the equations in 1 

are 

linear, they can be expressed in the form 

W\ 

= 

aii^i 

+ 

<212*2 

+ 

• • ■ 

+ 

a \ n*n 

W2 

= 

<*21*1 

+ 

<222*2 

4= 

... 

+ 

a 2 n x n 


= 

<2*1 

1*1 

+ 

a m 2*2 

+ 

. . . 

+ 

a mn x n 

which we can write in matrix notation as 










W1 ' 


"<211 

<212 ' 

• • 

a\ n 

"*r 




W2 

= 

<221 

<222 ' 

■ ■ 

a 2 n 

*2 




Wm 


<2ml 

<2 m 2 • 

. . 

a mn 

*n 



or more briefly as 


w = Ax 


( 2 ) 


( 3 ) 


(4) 


Although we could view this as a linear system, we will view it instead as a transformation that maps the column vector 
x in R n into the column vector w in R m by multiplying x on the left by A. We call this a matrix transformation (or 
matrix operator if m = #), and we denote it by Tj{. R n —»• R m . With this notation, Equation 4 can be expressed as 

w=T a (x) ( 5 ) 


The matrix transformation Tj\ is called multiplication by A, and the matrix A is called the standard matrix for the 
transformation. 


We will also find it convenient, on occasion, to express 5 in the schematic form 



( 6 ) 


which is read “Tj\ maps x into w” 

EXAMPLE 1 A Matrix Transformation from R 4 to R 3 

The matrix transformation T.R^ —* R? defined by the equations 

w\ = 2*i — 3*2 + *3 — 5*4 

W2 = 4xi +*2 ” 2^3 + *4 
vt>3 = 5xi“ X2 + 4X3 
can be expressed in matrix form as 


( 7 ) 








so the standard matrix for T is 


( 8 ) 


wi 

W2 

W3 







*1 


'2 

-3 

1 

-5 

*2 

= 

4 

1 

-2 

1 

*3 


5 

-1 

4 

0 

*4 


A = 


2 

4 

5 


-3 

1 

-1 


1 

-2 

4 


-5 

1 

0 


The image of a point (jcj, * 2 , * 3 > *4) can be computed directly from the defining equations 7 or from 8 
by matrix multiplication. For example, if 

(* 1 , X 2 , X 2 , * 4 ) = (1, - 3, 0, 2) 

then substituting in 7 yields wj = 1, w >2 = 3, W 3 = 8 (verify), or alternatively from 8 , 







f 


wi 


'2 

-3 

1 

- 5 ' 


1 






—3 



W 2 

= 

4 

1 

-2 

1 

0 

= 

3 

W 3 


5 

-1 

4 

0 


8 






2 




Some Notational Matters 

Sometimes we will want to denote a matrix transformation without giving a name to the matrix itself. In such cases we 
will denote the standard matrix for T. R n —► R™ by the symbol [ T ]. Thus, the equation 

T(x)=[T]x (9) 

is simply the statement that T is a matrix transformation with standard matrix [ T ], and the image of x under this 
transformation is the product of the matrix [ T] and the column vector x . 


Properties of Matrix Transformations 

The following theorem lists four basic properties of matrix transformations that follow from properties of matrix 
multiplication. 


THEOREM 4.9.1 

For every matrix A the matrix transformation T^. R n —► R m has the following properties for all vectors u and v 
in R n and for every scalar k\ 

(a) 0) = 0 

(b) ^= kT [Homogeneity property] 

(c) Ta(v + v) = T a (u) + T a (v) [Additivity property] 


















(d) t a(v - v) = T a ( u) - T a (v) 


All four parts are restatements of familiar properties of matrix multiplication: 

AO = 0, j4(£u) = £(^4u), A(u 4- v) = ^4u 4 - Av, ^4(u — v)=Au — Av 


It follows from Theorem 4.9.1 that a matrix transformation maps linear combinations of vectors in R n into the 
corresponding linear combinations in R m in the sense that 

7^(*iui +*2^2+ • ’ • +^Ur)=^l^(ui)+^2^(u2)+ • • • (10) 


Depending on whether ^-tuples and m-tuples are regarded as vectors or points, the geometric effect of a matrix 
transformation Tj^.R n —► R m is to map each vector (point) in R n into a vector (point) in R m (Figure 4.9.2). 


R* R m R* R m 



T maps vectors to vectors. 


T maps points to points. 


Figure 4.9.2 


The following theorem states that if two matrix transformations from R n to R m have the same image at each point of 
R n , then the matrices themselves must be the same. 


THEOREM 4.9.2 

If Tj±. R n —► R m and Tg: R n —► R m are matrix transformations, and if T^(x) = Tgfx) for every vector x in R n 
, then A = B- 


To say that 7^(x) = Tg(x) for every vector in R” is the same as saying that 


Ax = Bx 

for every vector x in R n . This is true, in particular, if x is any of the standard basis vectors e\, e 2 ,e„ for R n ; that is, 

Aej = Bej (j = 1, 2. n) ( 11 ) 

Since every entry of e / is 0 except for the yth, which is 1, it follows from Theorem 1.3.1 that Ae ? is the yth column of A 
and Be y is the yth column of B. Thus, it follows from 11 that corresponding columns of A and B are the same, and hence 
that ,4 = B • 






EXAMPLE 2 Zero Transformations 


If 0 is the yn x n zero matrix, then 

7q(x) = 0x = 0 

so multiplication by zero maps every vector in R n into the zero vector in R m . We call Tq the zero 
transformation from R n to R m . 


EXAMPLE 3 Identity Operators 

If / is the n x n identity matrix, then 

7/(x) =lx = x 

so multiplication by / maps every vector in R n into itself We call 7j the identity operator on R n . 


A Procedure for Finding Standard Matrices 

There is a way of finding the standard matrix for a matrix transformation from R n to R m by considering the effect of 
that transformation on the standard basis vectors for R n . To explain the idea, suppose that A is unknown and that 

ei, e 2 ,e„ 

are the standard basis vectors for R n . Suppose also that the images of these vectors under the transformation T A are 

T A (ei)=Aei, T A (e 2 ) = Ae 2 ,T A (e„)=Ae„ 

It follows from Theorem 1.3.1 that Ae, is a linear combination of the columns of A in which the successive coefficients 
are the entries of e /. But all entries of e ; are zero except theyth, so the product Aej is just theyth column of the matrix 
A. Thus, 


-4= [T J 4(ei)|7’ J 4(e 2 )| • • • \TA&n)] 


( 12 ) 


In summary, we have the following procedure for finding the standard matrix for a matrix transformation: 


Finding the Standard Matrix for a Matrix Transformation 

Step 1. Find the images of the standard basis vectors ej, e 2 ,e„ for R n in column form. 

Step 2. Construct the matrix that has the images obtained in Step 1 as its successive columns. This matrix is the 
standard matrix for the transformation. 


J 


Reflection Operators 


Some of the most basic matrix operators on and R are those that map each point into its symmetric image about a 

fixed line or a fixed plane; these are called reflection operators. Table 1 shows the standard matrices for the reflections 
about the coordinate axes in r}, and Table 2 shows the standard matrices for the reflections about the coordinate planes 
in r}. In each case the standard matrix was obtained by finding the images of the standard basis vectors, converting 
those images to column vectors, and then using those column vectors as successive columns of the standard matrix. 


Table 1 


Operator 

Illustration 

Images of ei and e 2 

Standard 

Matrix 

Reflection about the 
y-acis 

T(x,y) = (-x.y) 

(-*.>•) -— 

m 

V 

- {x.y) 

* * 

T( ei ) = 7(1,0) = (-1.0) 
7(e 2 ) = 7(0,1) = (0,1) 


"-1 o' 
0 1 


Reflection about the 

x-axis 

T(x,y) = (x, -y) 

f(\) - 

i> , <*.?) 

x j 

X 

-1-► 

1 

1 

* (t, -y) 

7(ei) = 7(1,0) = (1,0) 
7(e 2 ) = 7(0, 1) = (0, — 1 ) 


O —1 

1 

— 1 o 


Reflection about the line 
y = x 

T(x, y) = (y, x) 

i 

7\x) 

iy y = x 

(>\ x) 

\ 

Y 

X N (x.y) x 

-► 

7( ei ) = 7(1,0) = (0,1) 
7(e 2 ) = 7(0,1) = (1,0) 


0 f 

_! 0. 



Table 2 












































Projection Operators 


Matrix operators on g^ and g^ that map each point into its orthogonal projection on a fixed line or plane are called 
projection operators (or more precisely, orthogonal projection operators). Table 3 shows the standard matrices for the 
orthogonal projections on the coordinate axes in g}, and Table 4 shows the standard matrices for the orthogonal 
projections on the coordinate planes in gy. 


Table 3 


Operator 

Illustration 

Images of ei and e 2 

Standard 

Matrix 

Orthogonal projection on the 
*-axis T(x, y ) = (x, 0) 

i 

, (*.>) 

X 1 

1 

♦ tx 0) X 

7lx) 

T( ei ) = 7(1,0) = (1.0) 
7(e 2 ) = 7(0.1) = (0.0) 


i i 

O —L 

. 0 0 


Orthogonal projection on the 
7 -axis 7(x, y) = (0, y) 

f v 

(0-y) « - ( X ,y) 

T(\) x 

X 

1 

7( ei ) = 7(1,0) = (0,0) 
7(e 2 ) = 7(0,1) = (0,1) 


1 1 

o o 

— 1 o 



Table 4 


Operator 

Illustration 

Images of ei, e 2 , e 3 

Standard 

Matrix 

Orthogonal projection on 
the xy-plane 

T(x,y,z) = (x, y, 0) 

■/ 

i z 

x I (x, >; Z) 

l y 

;— t —^ 

T(x) 

(x y. 0) 

7(ei) = 7(1,0,0) = (1,0,0) 
7(e 2 ) = 7(0, 1,0) = (0,1,0) 
7(e 3 ) = 7(0,0, 1) = (0,0,0) 


"1 0 o' 
0 1 0 

0 0 0 


Orthogonal projection on 
the xz-plane 

T(x,y,z) = (x, 0,z) 

i 

(x() ,z) — 

m 

./ 

\ z 

Cx :) 

>’ 

-►- 

7(ej) = 7(1,0,0) = (1,0,0) 
7(e 2 ) = 7(0, 1,0) = (0,0,0) 
7(e 3 ) = 7(0,0.1) = (0.0.1) 


"1 0 o' 
0 0 0 

0 0 1 


Orthogonal projection on 
the yz-plane 

T(x,y,z) = (0, y,z) 

J 

.(0. z) 

1\x) / 

(x y. z) 

X >’ 

7( ei ) = 7(1,0,0) = (0,0,0) 
7(e 2 ) = 7(0, 1, 0) = (0, 1, 0) 
7(e 3 ) = 7(0,0, 1) = (0,0,1) 


0 0 o' 
0 1 0 

0 0 1 



Rotation Operators 




































Matrix operators on r} and R? that move points along circular arcs are called rotation operators. Let us consider how 
to find the standard matrix for the rotation operator T: B? — » R 1 that moves points counterclockwise about the origin 
through an angle 0 (Figure 4.9.3). As illustrated in Figure 4.9.3, the images of the standard basis vectors are 
T(e\) = T( 1, 0) = (cos #, sin#) and T(e2) = T( 0, 1) = ( — sin#, cos #) 


so the standard matrix for T is 




cos# —sin 9 
sin# cos# 



In keeping with common usage we will denote this operator by Rq and call 


Re = 


cos 9 
sin# 


—sin# 

cos# 


(13) 


the rotation matrix for R^. If x = (*, y) is a vector in R^, and if w= (w\, W 2 ) is its image under the rotation, then the 
relationship w = R@c can be written in component form as 

w\ = xcos# — ysin# 

(14 

W2 = xsm&^ycos9 

These are called the rotation equations for R?. These ideas are summarized in Table 5. 


Table 5 


Operator 

Illustration 

Rotation Equations 

Standard Matrix 

Rotation through an angle # 

i 

(w^w 2 ) 

« \ 

\ 

X \(x % y) 

o\ x 

w\ =xcos#—^sin# 
= xsin# -F.ycos# 

cos# —sin# 

sin# cos# 




In the plane, counterclockwise angles are positive 
and clockwise angles are negative. The rotation 
matrix for a clockwise rotation of —.0 radians can be 
obtained by replacing ff by —# in 12. After 
simplification this yields 

cos# sin# 

—sin# cos# 


R-e = 
























EXAMPLE 4 A Rotation Operator 


Find the image of x = (1, 1) under a rotation of % / 6 radians I — 30 i about the origin. 


It follows from 13 with Q = ^ / 6 that 


R tt/6 ^ — 






\n^] 



T 


2 


'0.37' 

l 


l + l/J 

£3 

1.37_ 



2 




or in comma-delimited notation, (1, 1) = (0.37, 1.37). 


Rotations in R 3 

A rotation of vectors in R 3 is usually described in relation to a ray emanating from the origin, called the axis of 
rotation. As a vector revolves around the axis of rotation, it sweeps out some portion of a cone (Figure 4.9.4a). The 
angle of rotation , which is measured in the base of the cone, is described as “clockwise” or “counterclockwise” in 
relation to a viewpoint that is along the axis of rotation looking toward the origin. For example, in Figure 4.9.4a the 
vector w results from rotating the vector x counterclockwise around the axis / through an angle 0. As in R^, angles are 
positive if they are generated by counterclockwise rotations and negative if they are generated by clockwise rotations. 




(a) Angle of rotation ( b ) Right-hand rule 


Figure 4.9.4 

The most common way of describing a general axis of rotation is to specify a nonzero vector u that runs along the axis 
of rotation and has its initial point at the origin. The counterclockwise direction for a rotation about the axis can then be 
determined by a “right-hand rule” (Figure 4.9.46): If the thumb of the right hand points in the direction of u, then the 
cupped fingers point in a counterclockwise direction. 

A rotation operator on R is a matrix operator that rotates each vector in R^ about some rotation axis through a fixed 
angle Q. In Table 6 we have described the rotation operators on r} whose axes of rotation are the positive coordinate 
axes. For each of these rotations one of the components is unchanged, and the relationships between the other 
components can be derived by the same procedure used to derive 14. For example, in the rotation about the z-axis, the 
z-components of x and w= T(x) are the same, and the x- and y-components are related as in 14. This yields the rotation 
equation shown in the last row of Table 6. 
















Table 6 


Operator 

Illustration 

Rotation Equations 

Standard Matrix 

Counterclockwise 
rotation about 
the positive *-axis 
through an 
angle 0 

jz 

y 

-► 

w x = X 

W 2 = y cos 0 - z sin 6 
s y sin 6 + z cos 0 


[1 0 0 

0 cos 0 -sin 0 

[ 0 sin 0 cos0 ^ 

Counterclockwise 
rotation about 
the positive v-axis 
through an 
angle O 

J 

•/ 

iZ 

:h 

u?\ = .vcos 0+2 sin 6 

u?2 = V 

w$ = -jc sin 0+ 2 cos 0 


cos0 0 sin0 

0 1 0 

-sin 0 0 cos 0 _ 

Counterclockwise 
rotation about 
the positive 2 -axis 
through an 
angle 0 

i 

_ l 

) 

iz 

l w 

V 

l -► 

w 1 = x cos 0- v sin 0 
w 2 = x sin 0 + y cos 0 

= 2 


cos 0 -sin 0 0 

sin 0 cos 0 0 

° 0 1. 


For completeness, we note that the standard matrix for a counterclockwise rotation through an angle 0 about an axis in 
which is determined by an arbitrary unit vector u = {a, b, c ) that has its initial point at the origin, is 


+ cos0 ab (1 — cos0) — csiru0 ac (1 — cos0) + bswB 


a 2 (\ — cosflj 

ab(l — cos0) + csin0 b^(\ — cos^j 4 = cos0 bc(l — cos0) — asmd 
ac(l — cos0) — bsw0 bc(l — cos0) + asm# c 2 ^l — cosflj + cos0 


(15) 


The derivation can be found in the book Principles of Interactive Computer Graphics , by W. M. Newman and R. F. 
Sproull (New York: McGraw-Hill, 1979). You may find it instructive to derive the results in Table 6 as special cases of 
this more general result. 


Dilations and Contractions 

If k is a nonnegative scalar, then the operator T’(x) = kx. on p or $ has the effect of increasing or decreasing the 
length of each vector by a factor of k. If 0 < k < 1 the operator is called a contraction with factor k, and if k > 1 it is 
























called a dilation with factor k (Figure 4.9.5). If £ = 1, then T is the identity operator and can be regarded either as a 
contraction or a dilation. Tables 7 and 8 illustrate these operators. 


x 



(a) 0<k< 1 


(b) k > 1 



Figure 4.9.5 


Table 7 


Operator 

Illustration 

T(x,y) = (kx,ky) 

Effect on the Standard Basis 

Standard 

Matrix 

Contraction with factor k 
on/? 2 (0 <k< 1) 

l 

x (x.y) 

W/fcAy) 

X 

(0. 1)1-1 

(0, A) 

H 

\k ol 

L° *J 



(1. 0) (k , 0) 

Dilation with factor k on 

R 2 (*>1) 

i 

,y 1\x) r p(kx.ky) 

x * U.y) 

X 

(0. 1)1 - 1 

'• (1. 

0) ' (k , 

0 ) 


Table 8 


Operator 

Illustration 

T(x,y, z) = (Ax, Ay, kz) 

Standard 

Matrix 

Contraction with 
factor k on R* 

(0 <kk<k 1) 

i 

i Z 

\ • (x. y. z) 

T(ky\ kz) 

Jr V 


*/ 



'k 0 Ol 

0 k 0 

.0 0 *J 

Dilation with 
factor k on R* 

(k> 1) 

i 

^ z (kx. ky\ kz) 

nyf 

x/c r,y, z) 

V 

X K 




Yaw, Pitch, and Roll 

In aeronautics and astronautics, the orientation of an aircraft or space shuttle relative to an xyz-coordinate 
system is often described in terms of angles called yaw, pitch, and roll. If, for example, an aircraft is flying 





































along the y-axis and the xy -plane defines the horizontal, then the aircraft’s angle of rotation about the z-axis is 
called the yaw, its angle of rotation about the x-axis is called the pitch , and its angle of rotation about the y- axis 
is called the roll. A combination of yaw, pitch, and roll can be achieved by a single rotation about some axis 
through the origin. This is, in fact, how a space shuttle makes attitude adjustments—it doesn't perform each 
rotation separately; it calculates one axis, and rotates about that axis to get the correct orientation. Such rotation 
maneuvers are used to align an antenna, point the nose toward a celestial object, or position a payload bay for 
docking. 


▲ z 



Expansion and Compressions 


In a dilation or contraction of p} or r}, all coordinates are multiplied by a factor k. If only one of the coordinates is 
multiplied by k, then the resulting operator is called an expansion or compression with factor k. This is illustrated in 
Table 9 for R*. You should have no trouble extending these results to g}. 


Table 9 


Operator 


Illustration 

T(x,y) = (kx,y) 


Effect on the Standard Basis 


Standard 

Matrix 


Compression of p} in the 

x-direction with factor k 
(0 <*< 1 ) 


t (kx.y) 

!*> -** 

x 


(0. 1) 


( 0 . 1 ) 


h- 


( 1 , 0 ) 


(*, 0 ) 


Expansion of R 2 in the 
x-direction with factor k 
(*> 1 ) 


Operator 


k> 


(*. y) ( kx % y ) 


10 . 1 ) 


7\x) 


x 


( 0 , 1 ) 


k 0 
0 1 


Illustration 

T{x,y) = (x,ky) 


(1.0) ' (k, 0) 

Effect on the Standard Basis 


Standard 

Matrix 


Compression of R 2 in the 

y-direction with factor k 
(0 <k<\) 


, (*>') 

* U ky) 


\ 


m 


(0. I) 


( 0 . 1 :) 


H Lt* 


1 


1 


'1 O' 

L 


.0 k_ 


( 1 . 0 ) 


( 1 , 0 ) 

































Operator 


Illustration 

T(x,y) = (kx,y) 


Effect on the Standard Basis 


Standard 

Matrix 


Expansion of R 2 in the 
^-direction with factor k 

(*> i) 



(O, I) 


(O.k) •- 


tt 


( 1 , 0 ) 


( 1 . 0 ) 


Shears 


A matrix operator of the form T(x, y) = (x \ ky,y ) translates a point (x 9 y) in the xy -plane parallel to the x-axis by 
an amount ky that is proportional to the y-coordinate of the point. This operator leaves the points on the x-axis fixed 
(since y = 0), but as we progress away from the x-axis, the translation distance increases. We call this operator the 
shear in the x-direction with factor k. Similarly, a matrix operator of the form T(x, y) = (x,y } kx) is called the 
shear in the y-direction with factor k. Table 10 illustrates the basic information about shears in gf. 


Table 10 


Operator 


Effect on the Standard Basis 


Standard 

Matrix 


Shear of R 2 in the x-direction with 
factor k T(x, y) = (x + ky, y) 


( 0 . 1 )' 


(k 1 ) 


<*. 1) 


l 


i * 
0 1 


( 1 . 0 ) 


( 1 , 0 ) 

(k >0) 


( 1 , 0 ) 

<*< 0 ) 


Shear of R* in the y-direction with 
factor k T(x, y) = {x, y + kx) 


(0. I)' 


( 0 . 1 ) 1 


< 0 . 1)1 


( 1 ,*) 


( 1 . 0 ) 


ik > 0) 


1 0 
k 1 


(l.Jt) 


(Jt < 0) 


2 

EXAMPLE 5 Some Basic Matrix Operators on R 


In each part describe the matrix operator corresponding to A, and show its effect on the unit square. 


(aMl = 


1 2 
0 1 


( b )^2 = 


2 0 
0 2 


(cM3 = 


2 0 
0 1 


By comparing the forms of these matrices to those in Tables 7, 9, and 10, we see that the 
matrix A\ corresponds to a shear in the x-direction with factor 2, the matrix Aj corresponds to a dilation 
with factor 2, and A% corresponds to an expansion in the x-direction with factor 2. The effects of these 
operators on the unit square are shown in Figure 4.9.6. 
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Figure 4.9.6 


OPTIONAL 

Orthogonal Projections on Lines Through the Origin 


In Table 3 we listed the standard matrices for the orthogonal projections on the coordinate axes in These are special 
cases of the more general operator T:R^ —+ R 2 that maps each point into its orthogonal projection on a line L through 
the origin that makes an angle 0 with the positive x-axis (Figure 4.9.7). In Example 4 of Section 3.3 we used Formula 10 
of that section to find the orthogonal projections of the standard basis vectors for on that line. Expressed in matrix 
form, we found those projections to be 

siru0cos0 
sin 2 0 


ei = 


cos 2 0 

siru0cos0 


and T 


e 2 = 



Thus, the standard matrix for T is 





\ 


\ 






cos 2 0 

•hm20 









cos^O 

sintfcos 9 


2 

T 

= 

T 

e i 

T 



= 



= 

-^-sin2 0 








sin9cos0 

sin 2 0 


sin 2 0 













_ 



i i 


l 1 





- 


In keeping with common usage, we will denote this operator by 





cos 2 0 

^-sin20 

COS^0 

sini9cos0 



2 

sin0cos0 

sin 2 0 


4sin2 0 

sin 2 0 




2 



( 16 ) 


We have included two versions of Formula 16 
because both are commonly used. Whereas the first 
version involves only the angle 0, the second 
involves both 9 and 20. 


















































EXAMPLE 6 Orthogonal Projection on a Line Through the Origin 


Use Formula 16 to find the orthogonal projection of the vector x = (1, 5) on the line through the origin 
that makes an angle of^/6 ^=30 J with the x-axis. 

Since sin(7T / 6) = 1 / 2 and cos / 6 j = ^3 / 2, it follows from 16 that the standard matrix 
for this projection is 


P n/6 — 


Thus, 


P ?r/6 x — 


cos 2 ^r/ 6j 

sin(7r / 6)cos (tt / 6) 


3 

4 

ill 

4 

sin(W 6 )cos(jt/ 6) 

sin 2 (V / 6) 


a 

1 

- 


4 

4 


i £] 



3 4- 5^3 



4 4 

'f 


4 


'2.91' 

a i 

5 


l/3 + 5 


1.68 

4 4 



4 




or in comma-delimited notation, P T /6(1, 5) « (2.91, 1.68) 


Reflections About Lines Through the Origin 

In Table 1 we listed the reflections about the coordinate axes in These are special cases of the more general operator 
Hq.R 2 —+ R 1 that maps each point into its reflection about a line L through the origin that makes an angle 0 with the 
positive x-axis (Figure 4.9.8). We could find the standard matrix for Hq by finding the images of the standard basis 
vectors, but instead we will take advantage of our work on orthogonal projections by using the Formula 16 for Pq to 
find a formula for Hq. 



Figure 4.9.8 


You should be able to see from Figure 4.9.9 that for every vector x in R n 

P$x — x = ^ \H$x — x j or equivalently Hqx. = \2Pq — I Jx 



















Thus, it follows from Theorem 4.9.2 that 


H e = 2Pg-I 


(17) 


and hence from 16 that 


Hg = 


cos20 sin20 
sin20 —cos20 


(18) 


EXAMPLE 7 Reflection About a Line Through the Origin 


Find the reflection of the vector x = (1, 5) on the line through the origin that makes an angle of n/6(= 30°) 
with the x-axis. 

Since sini^r / 3 J = }j3 f 2 and cos(tt / 3) = 1 / 2, it follows from 18 that the standard matrix 
for this projection is 

- i & 

cos (tc / 3) sin(7r/3) 2 2 

sin(W 3) “Cos(tt/3) 


^t/6 = 


n 

2 


Thus, 


#ir/6* = 


i £\ 



1 + 5/3 



2 2 

T 


2 


' 4.83 ' 

a _i 

2 2 

5 


JI=L 

2 

£3 

—1.63 


or in comma-delimited notation, H^f^{ 1, 5) ss (4.83, — 1.63) 


Show that the standard matrices in Tables 1 and 3 
are special cases of 18 and 16. 


Concept Review 

Function 

Image 




















Value 

Domain 

Codomain 

Transformation 

Relationships among the fundamental spaces 
Operator 

Matrix transformation 
Matrix operator 
Standard matrix 

Properties of matrix transformations 

Zero transformation 

Identity operator 

Reflection operator 

Projection operator 

Rotation operator 

Rotation matrix 

Rotation equations 

Axis of rotation in 3-space 

Angle of rotation in 3-space 

Expansion operator 

Compression operator 

Shear 

Dilation 

Contraction 

Skills 

Find the domain and codomain of a transformation, and determine whether the transformation is linear. 
Find the standard matrix for a matrix transformation. 

Describe the effect of a matrix operator on the standard basis in R n . 


Exercise Set 4.9 

In Exercises 1-2, find the domain and codomain of the transformation 7^(x) = Ax . 

(a) A has size 3x2- 

(b) A has size 2x3- 

(c) A has size 3x3- 

(d) A has size 1x6- 


Answer: 


(a) Domain: r}\ codomain: R-' 

(b) Domain: R^; codomain: R^ 

(c) Domain: R 3 ; codomain: R-' 

(d) Domain: R codomain: R 1 

(a) A has size 4x5- 

(b) A has size 5x4- 

(c) A has size 4x4- 

(d) A has size 3x1- 

3. If i, X 2 ) = (x\ 4 = X 2 , — 3xi), then the domain of T is_, the codomain of T is_, and 

the image of x = ( 1 , — 2 ) under T is_. 

Answer: 

R 2 , R 3 , (-1,2,3) 

4. If T(x\, X2, * 3 ) = (*i + 2 ^ 2 , *1 “ 2 x 2 ), ^ en domain of T is_, the codomain of T is_, 

and the image of x = (0, —1,4) under T is_. 

5. In each part, find the domain and codomain of the transformation defined by the equations, and determine whether 
the transformation is linear. 

(a) w 1 = 3xi - 2 x 2 + 4 ;t 3 
W 2 = 5xi-8x2+ X 3 

(b) W! = 2 xix 2 - x 2 
m> 2 = xi +3xjX2 
W 3 = xi + X 2 

(c) wi = 5xi - *2 + *3 
W 2 = — xi + X2+7X3 
W 3 = 2 xi —4x2 _ *3 

(d) w j = x^ — 3 x 2 + X 3 — 2 x 4 

2 

W 2 = 3xi—4x2 — X 3 + *4 
Answer: 

(a) Linear; R 3 R 2 

(b) Nonlinear; r} _> r} 

(c) Linear; R? R? 

(d) Nonlinear; R 4 r} 

6 . In each part, determine whether T is a matrix transformation. 

(a) T{x,y) = (2x,y) 

(b) T(x,y) = (-y,x) 

(c) T(x,y) = (2x+y,x-y) 

(d) 7(x, 7 )=(x 2 , 7 ) 



(e) 7(x.7) = 0.7 + 1) 

7. In each part, determine whether T is a matrix transformation. 

(a) 7(x,7,z) = (0,0) 

(b) 70,7,*) = (1,1) 

(c) 70, 7, *) = (3x - Ay, 2x - 5 z) 

(d) 7 (x,7,z) = (7 2 ,*) 

(e) 70,7,*) = 0“ 1,*) 

Answer: 



"3 5 -Y 

4 -1 1 ; 7( — 1, 2,4) = (3, -2, -3) 

3 2-1 

10. Find the standard matrix for the operator T defined by the formula. 

(a) rOl,X2) = (2*l-X2,*l+*2) 

(b) 7X*1,X2) = (*1,*2) 

(c) 701, *2, *3) = Ol + 2 x 2 +x 3 , xi+5x 2 ,x 3 ) 

( d ) T(xi,x 2 ,x 3 ) = (4xi,1x2, - 8x3 ) 

11. Find the standard matrix for the transformation T defined by the formula. 

(a) rOl,*2) = (*2, -*1. *1 +3 x 2 ,*i -x 2 ) 

(b) 70l,X2, x 3 , * 4 ) = (7xi +2 x 2 -*3 + *4, *2 + x 3 , -Xi) 



(c) T(x 1 , X 2 , x 3 ) = (0, 0, 0, 0, 0) 

(d) T{xi,x 2 ,x 3 ,x 4 ) = (x 4 ,x\,x 3 ,x 2 ,x\ -x 3 ) 


Answer: 


(a) 


0 1 

-1 0 

1 3 

1 -1 


(b) 


7 2-11 
0 1 10 
-10 0 0 


(c) 


0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 


(d) 


0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

-1 

0 


12. In each part, find T{x), and express the answer in matrix form. 


(a) 



x = 


3 

-2 


(b) 


(c) 


T 


-12 0 
3 1 5 





1 

1 

CO 

4^ 


"*r 

T 

= 

3 5 7 

; x = 

*2 



l 

o 


*3 


(d) 


T 


-1 1 
2 4 
7 8 


x = 


*1 

x 2 


13. In each part, use the standard matrix for T to find 7(x); then check the result by calculating T(x) directly. 

(a) T(x \,x 2 ) = (-xi+X 2 .X 2 );x=(- 1,4) 

(b) T{x\,X2,x 3 ) = (2xi -X 2 +X 3 , X2+x 3 , 0 ) ;x = ( 2 > “ 3 ) 


Answer: 

(a) T(-1,4) = (5,4) 

(b) TOl. — 3) = (0, -2,0) 

14. Use matrix multiplication to find the reflection of ( — 1, 2) about 
(a) the x-axis. 



(b) thej-axis. 

(c) the line y = x. 


15 . Use matrix multiplication to find the reflection of (2, — 5, 3) about 

(a) the xr-plane. 

(b) thexz-plane. 

(c) thejz-plane. 

Answer: 

(a) (2, -5,-3) 

(b) (2, 5, 3) 

(c) ("2, -5,3) 

16 . Use matrix multiplication to find the orthogonal projection of (2, — 5) on 

(a) the x-axis. 

(b) thej-axis. 

17 . Use matrix multiplication to find the orthogonal projection of ( — 2, 1, 3) on 

(a) thexj-plane. 

(b) thexz-plane. 

(c) thejz-plane. 

Answer: 

(a) (-2,1,0) 

(b) (-2,0,3) 

(c) (0, 1, 3) 

18 . Use matrix multiplication to find the image of the vector (3, — 4) when it is rotated through an angle of 

(a) 0 = 30°- 

(b) 0= _60°- 

(c) 0 = 45°- 

(d) 0 = 90°- 

19 . Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 

(a) 30° about the x-axis. 

(b) 45° about the j-axis. 

(c) 90° about the z-axis. 


Answer: 



(b) (o, 1 , 2 / 2 ) 

(c) (-1. -2.2) 



20 . Find the standard matrix for the operator that rotates a vector in R-' through an angle of ^60 about 

(a) the x-axis. 

(b) they-axis. 

(c) thez-axis. 

21 . Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 

(a) —30 about the x-axis. 

(b) —45 about the y-axis. 

(c) —90 about the z-axis. 

Answer: 

( a ) / ^ + 2 —1 + 2 \ j ~3 

\ 9 2 ’ 2 

(b) (- 2 / 2 , 1 , 0 ) 

(C) (1,2,2) 

22 . In R-' the orthogonal projections on the x-axis, y-axis, and z-axis are defined by 

Ti(x,y.z) = (*, 0, 0), T 2 ix.y.z) = (0,7, 0), 

T 3 (x,y,z) = (0, 0,z) 

respectively. 

(a) Show that the orthogonal projections on the coordinate axes are matrix operators, and find their standard 
matrices. 

(b) Show that ifT.R 3 —>R 3 is an orthogonal projection on one of the coordinate axes, then for every vector x in R 
, the vectors T(x) and x — T(x) are orthogonal. 

(c) Make a sketch showing x and x — T(x) in the case where T is the orthogonal projection on the x-axis. 

23. Use Formula 15 to derive the standard matrices for the rotations about the x-axis, y-axis, and z-axis in R-\ 

24 . Use Formula 15 to find the standard matrix for a rotation of ^ / 2 radians about the axis determined by the vector 
v = (1, 1, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 

25 . Use Formula 15 to find the standard matrix for a rotation of 180° about the axis determined by the vector 
v = (2, 2, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 

Answer: 

8 4" 

9 9 

1 4 

9 9 

4 _7 

9 9 

26 . It can be proved that if A is a 2 x 2 matrix with orthonormal column vectors and for which det(^4) = 1, then 
multiplication by A is a rotation through some angle 0. Verify that 




satisfies the stated conditions and find the angle of rotation. 

27. The result stated in Exercise 26 can be extended to that is, it can be proved that if A is a 3 x 3 matrix with 

orthonormal column vectors and for which det(^4) = 1, then multiplication by A is a rotation about some axis 

through some angle 0. Use Formula 15 to show that the angle of rotation satisfies the equation 

„ tr(j4) - 1 

cos 0 = — i —^- 

28. Let A be a 3 x 3 matrix (other than the identity matrix) satisfying the conditions stated in Exercise 27. It can be 
shown that if x is an y nonzero vector in then the vector u = Ax I A 1 x I |^1— tr^djjx determines an axis 

rotation when u is positioned with its initial point at the origin. [See “The Axis of Rotation: Analysis, Algebra, 
Geometry,” by Dan Kalman, Mathematics Magazine, Vol. 62, No. 4, October 1989.] 

(a) Show that multiplication by 


A = 


i 

9 

8 

9 

4 

■9 


4 

'9 

4 

9 

7 

9 


is a rotation. 

(b) Find a vector of length 1 that defines an axis for the rotation. 

(c) Use the result in Exercise 27 to find the angle of rotation about the axis obtained in part (b). 

29. In words, describe the geometric effect of multiplying a vector x by the matrix A. 

2 0 " 

.0 0. 

'2 0 
0 -2 


^ A = 


Answer: 


(a) Twice the orthogonal projection on the x-axis. 

(b) Twice the reflection about the x-axis. 


30. In words, describe the geometric effect of multiplying a vector x by the matrix A. 


^ A = 
(b) 

A = 


2 0 
0 3 

£ 

2 

1 

2 



31. In words, describe the geometric effect of multiplying a vector x by the matrix 



Answer: 


cos 2 * — sin 2 * 
2 sin* cos * 


—2 sin* cos * 
cos 2 * — sin 2 * 


Rotation through the angle 29- 

32. If multiplication by A rotates a vector x in the xy-plane through an angle 0, what is the effect of multiplying x by A ^ 
? Explain your reasoning. 

33. Let XQ be a nonzero column vector in and suppose that T.R} —► R? is the transformation defined by the formula 
T(x) = xo 4- R(pc, where Rq is the standard matrix of the rotation of g} about the origin through the angle 0. Give a 
geometric description of this transformation. Is it a matrix transformation? Explain. 

Answer: 

Rotation through the angle 0 and translation by xq; not a matrix transformation since xq is nonzero. 

34. A function of the form / (x) = mx 4- b is commonly called a “linear function” because the graph of y = mx I b is 
a line. Is/a matrix transformation on R7 

35. Let x = XQ 4 - tv be a line in R n , and let T: R n —» R n be a matrix operator on R n . What kind of geometric object is 
the image of this line under the operator 77 Explain your reasoning. 

Answer: 

A line in R n . 

True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) If A is a 2 x 3 matrix, then the domain of the transformation Tj\ is £ 2 . 

Answer: 

False 

(b) If A is an m x n matrix, then the codomain of the transformation Tj\ is R n . 

Answer: 

False 

(c) If T. R n —► R m and T(0) = 0, then T is a matrix transformation. 

Answer: 

False 

(d) If T:R n —► R m and T(c\x 4 = C2y) =c\T(x) 4= C 2 T(y) for all scalars c\ and C2 and all vectors x and y in R n , then 
T is a matrix transformation. 

Answer: 

True 

(e) There is only one matrix transformation T.R n —► R™ such that T{ — x) = — T(x) for every vector x in R n . 




Answer: 


False 

(f) There is only one matrix transformation T\R n —► R m such that T(x -I- y) = T(x — y) for all vectors x and y in R n 
Answer: 

True 

(g) If b is a nonzero vector in R n , then T(x) = x 4- b is a matrix operator on R n . 

Answer: 

False 

(h) 

The matrix 


is the standard matrix for a rotation. 


Answer: 

False 

® The standard matrices of the reflections about the coordinate axes in 2-space have the form 
a= ± 1 - 
Answer: 

True 


0 


—a 


where 
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4.10 Properties of Matrix Transformations 

In this section we will discuss properties of matrix transformations. We will show, for example, that if several 
matrix transformations are performed in succession, then the same result can be obtained by a single matrix 
transformation that is chosen appropriately. We will also explore the relationship between the invertibility of a 
matrix and properties of the corresponding transformation. 


Compositions of Matrix Transformations 

Suppose that T A is a matrix transformation from R n to R^ and Tg is a matrix transformation from r} to R m . If x 
is a vector in R n , then T A maps this vector into a vector T A (x) in R^, and Tg, in turn, maps that vector into the 
vector Tg(T A (x)) in R m . This process creates a transformation from R n to R m that we call the composition of 
Tg with T A and denote by the symbol 

TboT A 

which is read “Tg circle T A . As illustrated in Figure 4.10.1, the transformation T A in the formula is performed 
first; that is, 


(T B oTa)(x) = Tb(T a (x)) (1) 

This composition is itself a matrix transformation since 

(Tb O Ta) (x) = Tb(T a (x)) = B(Ta(x)) = B(Ax) = (BA)x 
which shows that it is multiplication by BA This is expressed by the formula 

T B oT a =Tba (2) 


WARNING 

Just as it is not true, in general, that 

AB = BA 

so it is not true, in general, that 

TboT a =T A oTb 
That is, order matters when matrix 
transformations are composed. 


R* 


R k T 4 (x) 


T b {T a {x)) 


Figure 4.10.1 


Compositions can be defined for any finite succession of matrix transformations whose domains and ranges have 


the appropriate dimensions. For example, to extend Formula 2 to three factors, consider the matrix 
transformations 

TaR” — R k , T B :R k -+ R l , T c -.R l ^R m 
We define the composition ITq o Tq o Ta)R n ->R m by 

( T C oT B oTjd (x) = T c (T b (T a (x))) 

As above, it can be shown that this is a matrix transformation whose standard matrix is CBA and that 

T C oT B oT a =T C ba 


( 3 ) 


As in Formula 9 of Section 4.9 , we can use square brackets to denote a matrix transformation without 
referencing a specific matrix. Thus, for example, the formula 

[T 2 oT l ] = [T 2 ][T l ] (4) 

is a restatement of Formula 2 which states that the standard matrix for a composition is the product of the 
standard matrices in the appropriate order. Similarly, 

[T 30 T 2 oTi] = [T3][T 2 ][Ti] (5) 


is a restatement of Formula 3. 

EXAMPLE 1 Composition of Two Rotations 

Let T\R? —* R? an< i TiR? —► R? be the matrix operators that rotate vectors through the angles 9\ 
and #2? respectively. Thus the operation 

(7 2 oT 1 )(x)=r 2 (7 1 (x)) 

first rotates x through the angle 9\ , then rotates T\ (x) through the angle 02- It follows that the net 
effect ofT 2 o7i is to rotate each vector in R 1 through the angle 9\ + 02 (Figure 4.10.2). Thus, the 
standard matrices for these matrix operators are 

cos02 “Sin02 
sin02 cos02 

cos( 0 i + 02) “Sin( 0 i + 02) 
sin ( 0 i +02) cos ( 0 i + 02) 

These matrices should satisfy 4. With the help of some basic trigonometric identities, we can 
confirm that this is so as follows: 


72071 


[Ti] = 


cos0i — sini0i 
sin0i cos0i 


t 2 












[T 2 ][Ti] 


cos02 

—sin#2 

cos 0 i 

—sin^i 

sin 02 

cos@2 

sin 0 i 

cos^i 


cos02cos0i — sin02sin0i — (cos02sin0i + sin02cos0i) 
sin02cos0i + cos02sin0i —sin02sin0i + cos02cos0i 

cos(0j + 02) — sin(0i 4- 02) 
sin (0i + 02) cos (0i + 02) 

= [^2 0 7*1] 



EXAMPLE 2 Composition Is Not Commutative 


Let T\R^ —► PR be the reflection about the line y = x, and let 7 ^; £ 2 —► j ? 2 be the orthogonal 
projection on they-axis. Figure 4.10.3 illustrates graphically that T\ o T 2 and T 2 o T\ have 
different effects on a vector x . This same conclusion can be reached by showing that the standard 
matrices for T\ and T 2 do not commute: 


[T\ o T 2 ] = 


[7 2 oTi] = 


7i 


t 2 


so [7 2 o Ti] * [T\ 0 T 2 ]. 
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Figure 4.10.3 


EXAMPLE 3 Composition of Two Reflections 

Let be the reflection about the r-axis. and let T-, R ~ 2 _► p} be the reflection about the 

x-axis. In this case T\ o T 2 and T 2 oT[ are the same; both map every vector x = (x. y ) into its 
negative -x = ( - x, -y ) (Figure 4.10.4): 

(Tio T 2 )(x,y) =T\(x, -y) = (-x, -y) 

(T 2 oT0(x,y) = T 2 (-x,y) = (-x, -y) 

The equality of T\ 0 T 2 and T 2 o T\ can also be deduced by showing that the standard matrices for 
T\ and T 2 commute: 


0 

to 

= 

Ti 

t 2 

= 

'-1 O' 
0 

'1 O' 
.0 -1 

= 

'-1 

0 

o' 

-1_ 

[7207!] 

= 

t 2 

Ti 

= 

'1 O' 
.0 -1 

'-1 o' 
0 1 

= 

'-1 

0 

o' 

-1_ 


The operator T(x) = — x on g} or R 1 ' is called the reflection about the origin. As the foregoing 
computations show, the standard matrix for this operator on g} is 



Figure 4.10.4 


EXAMPLE 4 Composition of Three Transformations 

Find the standard matrix for the operator T.E? —+ Fr' that first rotates a vector counterclockwise 
about the z-axis through an angle then reflects the resulting vector about the yz- plane, and then 
projects that vector orthogonally onto the xy- plane. 

The operator T can be expressed as the composition 




































T = T 3 o T 2 o T\ 

where T\ is the rotation about the z-axis, T 2 is the reflection about the yz-plane, and T 3 is the 
orthogonal projection on the xy- plane. From Tables 6 , 2, and 4 of Section 4.9 , the standard 
matrices for these operators are 


Ti 

— 

cos 9 —sin 0 0 
sin 0 cos 0 0 


t 2 

— 

'-1 0 O' 
0 1 0 

7 

73 

— 

'1 0 o' 
0 1 0 



0 0 1 




0 0 1 




0 0 0 


Thus, it follows from 5 that the standard matrix for T is 



'1 

0 

o' 

'-1 

0 

o' 

cos 0 

—sin 8 

o' 

[T] = 

0 

1 

0 

0 

1 

0 

sin 0 

cos 0 

0 


0 

0 

0 

0 

0 

1 

0 

0 

1 


—cos 6 sin0 0 
sin0 cos 6 0 

0 0 0 


One-to-One Matrix Transformations 

Our next objective is to establish a link between the invertibility of a matrix A and properties of the 
corresponding matrix transformation Ty\. 


DEFINITION 1 

A matrix transformation Tj±. R* —► R m is said to be one-to-one if maps distinct vectors (points) in R n 
into distinct vectors (points) in R m . 


J 


(See Figure 4.10.5). This idea can be expressed in various ways. For example, you should be able to see that the 
following are just restatements of Definition 1: 

Tj\ is one-to-one if for each vector b in the range of A there is exactly one vector x in R n such that Tjp t = b. 
Tj\ is one-to-one if the equality T u) = T^(v) implies that u = v- 


R n 


/r 


One-to-one 



/r /r 


Not one-to-one 


Figure 4.10.5 


























Rotation operators on are one-to-one since distinct vectors that are rotated through the same angle have 
distinct images (Figure 4.10.6). In contrast, the orthogonal projection of on the xy-plane is not one-to-one 
because it maps distinct points on the same vertical line into the same point (Figure 4.10.7). 



Distinct vectors u and v are rotated into distinct vectors T(u) and T(v) 



The distinct points P and Q are mapped into the same point M 

The following theorem establishes a fundamental relationship between the invertibility of a matrix and properties 
of the corresponding matrix transformation. 


THEOREM 4.10.1 

If A is an ^ x n matrix and Tj±. R n —+ R n is the corresponding matrix operator, then the following 
statements are equivalent. 

(a) A is invertible. 

(b) The range of Ta is R n . 

(c) Ta is one-to-one. 

We will establish the chain of implications (a) > (6) (c) => (a). 

(«) ■* (*) Assume thatv4 is invertible. By parts (a) and (e) of Theorem 4.8.10, the system = b is consistent 
for every ^ x 1 matrix b in R n . This implies that Ta maps x into the arbitrary vector b in R ”, which in turn 
implies that the range of Ta is all of R”. 

( b ) =» (c) Assume that the range of Ta is R This implies that for every vector b in R n there is some vector x 
in R n for which T^(x) = b and hence that the linear system = b is consistent for every vector b in R n . But 
the equivalence of parts ( e ) and if) of Theorem 4.8.10 implies that Ax. = b has a unique solution for every vector 






b in R }} and hence that for every vector b in the range of T there is exactly one vector x in R n such that 
Tjp = h. 

CC) =» («) Assume that Tj\ is one-to-one. Thus, if b is a vector in the range of Tj\, there is a unique vector x in 
R n for which T = b. We leave it for you to complete the proof using Exercise 30. 


EXAMPLE 5 Properties of a Rotation Operator 

As indicated in Figure 4.10.6, the operator T.R” —» R n that rotates vectors in through an angle 
Q is one-to-one. Confirm that [T] is invertible in accordance with Theorem 4.10.1. 


From Table 5 of Section 4.9 the standard matrix for T is 


T 


cos 0 —sin 0 
sin 0 cos 0 


This matrix is invertible because 


det 


T 


cos 0 
sin0 


—sin 0 
cos 0 


= cos 2 # 4- sin 2 0 =1*0 


EXAMPLE 6 Properties of a Projection Operator 

As indicated in Figure 4.10.7, the operator T\R n —► R n that projects each vector in R-' 
orthogonally on the xy-plane is not one-to-one. Confirm that [ T] is not invertible in accordance 
with Theorem 4.10.1. 

From Table 4 of Section 4.9 the standard matrix for T is 




1 

O 

O 

T 

= 

0 1 0 



1 

o 

o 

o 


This matrix is not invertible since det[T] = 0. 


Inverse of a One-to-One Matrix Operator 

If Tj[.R n —¥ R n is a one-to-one matrix operator, then it follows from Theorem 4.10.1 that A is invertible. The 
matrix operator 

T _j R n > R 71 

that corresponds to yj -1 is called the inverse operator or (more simply) the inverse of Tj\. This terminology is 
appropriate because Tj\ and 7^-1 cancel the effect of each other in the sense that if x is any vector in R n , then 














T A (T A . 1(X)) = AA -1 x = /x = X 
T A ~ 1 ( Ta ( x )) = = lx = x 

or, equivalently, 

T *° T A-' = t aa-'= t i 
T A-'° Ta =t a-'a = t < 

From a more geometric viewpoint, if w is the image of x under 7^, then ^-1 maps w back into since 

V 1 ( W ) = V 1 (M X ))= X 

(Figure 4.10.8). 



Figure 4.10.8 


Before considering examples, it will be helpful to touch on some notational matters. If Tj±. R n —» R n is a 
one-to-one matrix operator, and if 7^_i \R n —► A"” is its inverse, then the standard matrices for these operators 
are related by the equation 


T A -l = T A l (6) 

In cases where it is preferable not to assign a name to the matrix, we will write this equation as 

[7-‘] = m-‘ (7) 


EXAMPLE 7 Standard Matrix for T 1 


Let 7* p} ^ be the operator that rotates each vector in p 1 through the angle 9, so from Table 5 
of Section 4.9 , 


T 


cos 0 —sin 9 
sin 0 cos 6 


( 8 ) 


It is evident geometrically that to undo the effect of T, one must rotate each vector in f’ 2 through 
the angle —9- But this is exactly what the operator does, since the standard matrix for J -1 is 

cos ( — 0) — sin (— 9) 
sin ( — 6) cos (—0) 


[T~ l ] = [T]~ l = 


cos 0 sin 0 
—sin 9 cos 9 


(verify), which is the standard matrix for a rotation through the angle — Q. 












EXAMPLE 8 Finding T 1 

Show that the operator X: R? —► R? defined by the equations 

w\ =2 x\+X2 

= 3x\ +4x2 

is one-to-one, and find (wi, m ?2 Y 


The matrix form of these equations is 

w\ 

W2 


2 
3 4 


:][i; 


so the standard matrix for T is 


2 1 
3 4 


This matrix is invertible (so T is one-to-one) and the standard matrix for 7 * 1 is 


4 _i 

5 5 

1 2 
■5 5 


Thus 


I T~ l 

~W\~ 

L J 

W 2 


4 _1 

5 5 

3 2 
■5 5 


W 2 


5 wi--w 2 


4 

5 

3 2 

-wi +-w 2 


from which we conclude that 


T 1 | 


wi + 


Linearity Properties 

Up to now we have focused exclusively on matrix transformations from R n to R m . However, these are not the 
only kinds of transformations from R n to R m . For example, if / 1 , f 2 * • fm are an Y functions of the n 
variables *i,X 2 ,---, x n? then the equations 

= /lOl>*2,— 

™2 = /2(*l>*2>— 

= /roOl>*2> — 

define a transformation —► R m that maps the vector x = (x\, X 2 , - x n ) into the vector (w\, W 2 ,vi^). 

But it is only in the case where these equations are linear that T is a matrix transformation. The question that we 





















will now consider is this: 


1 


Question 

Are there algebraic properties of a transformation T.R n —► R m that can be used to determine whether T is 
a matrix transformation? 


J 


The answer is provided by the following theorem. 


THEOREM 4.10.2 

T.R™ R m is a matrix transformation if and only if the following relationships hold for all vectors u 
and v in R n and for every scalar k\ 

(i) T(n + v) = T( (u) 4- T(v)) [ Additivity prop erty ] 

(H) T(ku) = kT( u) [Homogeneity property] 

If T is a matrix transformation, then properties (i) and (ii) follow respectively from parts (c) and ( b ) of 
Theorem 4.9.1. 

Conversely, assume that properties (i) and (ii) hold. We must show that there exists an m x n matrix A such that 

T(x) = Ax 

for every vector x in R As a first step, recall from Formula (10) of Section 4.9 that the additivity and 
homogeneity properties imply that 

T(k\u\ 4- &2U2 4- • • • +k r u r )=k\T(\ii)+k2T(xi2)+ ■ • ■ +k r T(u r ) (9) 

for all scalars k\ , k 2 , - - .,k r and all vectors ui , U 2 ,.. n r in R n . Let A be the matrix 

A=[T(e { )\T(e 2 )\- - • |T(e„)] 

in which e 2 ,e M are the standard basis vectors for 

It follows from Theorem 1.3.1 that Ax is a linear combination of the columns of A in which the successive 
coefficients are the entries x\, X2, - of x . That is, 

Ax = x\T(e\) +X 2 T(e 2 ) + • • • +x„T(e„) 

Using 9 we can rewrite this as 

Ax = T(x\e\ +*2*2 + " “ ’ + x n e n) = T( x ) 

which completes the proof. 

The additivity and homogeneity properties in Theorem 4.10.2 are called linearity conditions , and a 
transformation that satisfies these conditions is called a linear transformation. Using this terminology Theorem 


4.10.2 can be restated as follows. 


THEOREM 4.10.3 

Every linear transformation from R* 1 to R m is a matrix transformation, and conversely, every matrix 
transformation from R n to R m is a linear transformation. 


More on the Equivalence Theorem 

As our final result in this section, we will add parts ( b ) and (c) of Theorem 4.10.1 to Theorem 4.8.10. 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is I n . 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b- 

(f) Ax = b has exactly one solution for every n x 1 matrix b- 

(g) det(^4) * 0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span R™. 

(k) The row vectors of A span R n . 

(l) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n. 

(o) A has nullity Q. 

(p) The orthogonal complement of the null space of A is R n . 

(q) The orthogonal complement of the row space of A is { 0 } . 

(r) The range of T is R n . 

(s) Tj\ is one-to-one. 


Concept Review 

Composition of matrix transformations 
Reflection about the origin 
One-to-one transformation 
Inverse of a matrix operator 
Linearity conditions 
Linear transformation 

Equivalent characterizations of invertible matrices 

Skills 

Find the standard matrix for a composition of matrix transformations. 

Determine whether a matrix operator is one-to-one; if it is, then find the inverse operator. 
Determine whether a transformation is a linear transformation. 


Exercise Set 4.10 

In Exercises 1-2, let T and Tg be the operators whose standard matrices are given. Find the standard matrices 
for and T^o Tg . 


1 . 

"l 

-2 

O' 


'2 

-3 

3' 

A = 

4 

1 

-3 

. B = 

5 

0 

1 


5 

2 

4 


6 

1 

7 


Answer: 


5 

-1 

21 " 


'-8 

-3 

f 

10 

-8 

4 

II 

E? 

0 

-5 

1 

00 

45 

3 

25 


44 

-11 

45 


2 . 

'6 

3 

-f 


4 

0 

4' 

A- 

2 

0 

1 

, B = 

-1 

5 

2 


4 

-3 

6 


2 

-3 

8 


3. Let T\ (xi, * 2 ) = Oi +*2, *1 -*2) and Tjixx, * 2 ) = (3*i, 2x\ +4*2) • 

(a) Find the standard matrices for T\ and 7*2 • 

(b) Find the standard matrices for 7*2 o T\ and T\ o 7*2 

(c) Use the matrices obtained in part (b) to find formulas for T\ (7*2 (*i, * 2 )) an d 72(7*1 (*l,* 2 )) • 


Answer: 















(c) T 2 (T\(x\, x 2 )) = (3xi +3x 2 , 6xi -2 x 2 ), 


Ti(T 2 (x u x 2 )) = (5xi+4x 2 , x\~4x 2 ) 

4 . LetTi(xi,x 2 ,X3) = (4x\, -2x\+x 2 , -x\-3x 2 ) andT 2 (x\, x 2 , X3) = (xi+ 2x 2 , - X3, 4x\ -X3). 

(a) Find the standard matrices for T\ and 7*2 • 

(b) Find the standard matrices for 7*2 o T\ and T\ 0 X 2 - 

(c) Use the matrices obtained in part (b) to find formulas for 7*1 (7*2 (*i, * 3 )) an d 7*2 (7^ (*1> x 2> * 3 ))- 

5. Find the standard matrix for the stated composition in p}. 


(a) A rotation of 90°, followed by a reflection about the line y = x. 


(b) 


An orthogonal projection on the y-axis, followed by a contraction with factor k = 


I 

2 * 


(c) A reflection about the x-axis, followed by a dilation with factor k = 3- 


Answer: 

(a) [1 

_° - 1 _ 

(b) fo O' 

.0 * 

(c) 3 0 

.0 — 3 _ 

6. Find the standard matrix for the stated composition in g}. 

(a) A rotation of 60°, followed by an orthogonal projection on the x-axis, followed by a reflection about the 
line y = x. 

(b) A dilation with factor k = 2? followed by a rotation of 45°, followed by a reflection about the y-axis. 

(c) A rotation of 15°, followed by a rotation of 105°, followed by a rotation of 60°. 

7. Find the standard matrix for the stated composition in 

(a) A reflection about the yz-plane, followed by an orthogonal projection on the xz-plane. 

(b) A rotation of 45° about the y-axis, followed by a dilation with factor k = ^2- 

(c) An orthogonal projection on the xy-plane, followed by a reflection about the yz-plane. 


Answer: 

(a) — 1 0 0 

0 0 0 
0 0 1 



(b) 


1 0 1 
0 ft 0 
-1 0 1 


(c) 


-10 0 
0 1 0 

0 0 0 


8. Find the standard matrix for the stated composition in p^. 

(a) A rotation of 30° about the x-axis, followed by a rotation of 30° about the z-axis, followed by a 
contraction with factor k = \. 

4 

(b) A reflection about the xy-plane, followed by a reflection about the xz-plane, followed by an orthogonal 
projection on theyz-plane. 

(c) A rotation of 270° about the x-axis, followed by a rotation of 90° about the y-axis, followed by a rotation 
of 180° about the z-axis. 


9. Determine whether T\ oT 2 = T 2 oT\. 

(a) R? —► R " * s the orthogonal projection on the x-axis, and Ti .R? R* i s the orthogonal projection on 
the y-axis. 

(b) T\ R? —♦ R^ is the rotation through an angle 9 \, and p} is the rotation through an angle # 2 * 

(c) T\.R 2 —>R 2 is the orthogonal projection on the x-axis, and T^ .p} R? is the rotation through an angle 
0 . 


Answer: 

(a) T l oT 2 = T 2 oTi 

(b) T \ o7 , 2 = T 2 oT 1 

(c) 7-! o r 2 * r 2 o Tl 

10. Determine whether T\oT 2 = T 2 oT\. 

(a) T \: R? —► R? * s a dilation by a factor k , and 7 .-, 7 3 7 -' is the rotation about the z-axis through an angle 

0 . 

(b) T\ .R? R? i s the rotation about the x-axis through an angle 9\ , and Ti .R? —> R? is the rotation about 
the z-axis through an angle # 2 * 

11. By inspection, determine whether the matrix operator is one-to-one. 

(a) the orthogonal projection on the x-axis in p} 

(b) the reflection about the y-axis in p 2 

(c) the reflection about the line y = x in 7 2 

(d) a contraction with factor fc > Q in p^ 

(e) a rotation about the z-axis in p-' 

(f) a reflection about the xy-plane in 

(g) a dilation with factor £ > Q in 



Answer: 


(a) Not one-to-one 

(b) One-to-one 

(c) One-to-one 

(d) One-to-one 

(e) One-to-one 

(f) One-to-one 

(g) One-to-one 

12. Find the standard matrix for the matrix operator defined by the equations, and use Theorem 4.10.4 to 
determine whether the operator is one-to-one. 

(a) wi = 8*i +4x2 
m>2 = 2x\+ X2 

(b) w\ =2xi -3*2 
W 2 = 5*i + X2 

(c) wi = — *i + 3^2 + 2 x 3 

m>2 = 2x\ +4x3 

w 3 = xi+ 3 x 2 + 6 x 3 

(d) wi = xi + 2x 2 + 3x 3 

vi>2 = 2xi + 5*2 + 3*3 
W 3 = xi + 8 x 3 

13. Determine whether the matrix operator T.R 2 —► R defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find jwj, wj j. 

(a) wi = ^i + 2x 2 
v>2 = -xi + X 2 

(b) wi = 4xi-6 x 2 
v >2 = - 2 xi + 3 x 2 

(c) w l= -*2 

W 2 = -*l 

(d) wi = 3xi 
W 2 = -5xi 

Answer: 

(a) [1 _2j 

One-to-one; ^ ^ ; T~ l Oi, W 2 ) = ^wi - jW 2 , ywi + 

3 3\ 

(b) Not one-to-one 

^ One-to-one; * ; T~ l Oi, W 2 ) = ( - vi> 2 > — wi) 



(d) Not one-to-one 


14. Determine whether the matrix operator TR-' —► R? defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find T 1 iw\, vi? 2 , W 3 J. 

(a) wi = *1 “ 2 x 2 + 2 x 3 
W 2 = 2 xi + *2 + *3 
m? 3 = xi + X 2 

(b) wi = *1 -3x2+4x3 

m? 2 = “Xi + X2 + X3 

= — 2 x 2 + 5x3 

(c) wi = xi +4x 2 -x 3 
m? 2 = 2xi + 7x2 + *3 

= xi + 3x2 

(d) wi = xi+ 2x 2 + *3 
vi?2 = — 2xi + ^2 + 4x3 

W 3 = 7xi+4x2—5x3 


15. By inspection, find the inverse of the given one-to-one matrix operator. 

(a) The reflection about the x-axis in g? 

(b) The rotation through an angle of ^ j 4 in p}. 

(c) The dilation by a factor of 3 in ft}. 

(d) The reflection about the yz-plane in 

( e ) The contraction by a factor of I in/? 3 . 


Answer: 


(a) Reflection about the x-axis 

(b) Rotation through the angle — ^ 

( c ) Contraction by a factor of ~ 


(d) Reflection about the yz-plane 

(e) Dilation by a factor of 5 


In Exercises 16—17, use Theorem 4.10.2 to determine whether T.R 2 —» r} is a matrix operator. 


16 -(a) T(x,y) = (2x,y) 

(b) r(x^)=(x 2 , 7 ) 

(c) T(x,y) = (-y,x) 

(d) T(x,y) = (x, 0) 

17 • (a) T(x,y) = (2x +y,x-y) 
(b) T(x,y) = (x+ l,^) 



(c) T(x,y) = (y,y) 

(d) Tlx.y ) = 

Answer: 

(a) Matrix operator 

(b) Not a matrix operator 

(c) Matrix operator 

(d) Not a matrix operator 

In Exercises 18-19, use Theorem 4.10.2 to determine whether T.R? —► F? is a matrix transformation. 

18 -(a) T(x,y,z) = (x, x+y+z) 

(b) T(x,y,z) = ( 1,1) 

19 ' (a) T{x,y,z) = { 0,0) 

(b) T(x, y, z) = (3x - Ay, 2x - 5 z) 

Answer: 

(a) Matrix transformation 

(b) Matrix transformation 

20. In each part, use Theorem 4.10.3 to find the standard matrix for the matrix operator from the images of the 
standard basis vectors. 

(a) The reflection operators on p 1 in Table 1 of Section 4.9 . 

(b) The reflection operators on p-' in Table 2 of Section 4.9 . 

(c) The projection operators on p 2 in Table 3 of Section 4.9 . 

(d) The projection operators on p -■ in Table 4 of Section 4.9 . 

(e) The rotation operators on p 2 in Table 5 of Section 4.9 . 

(f) The dilation and contraction operators on £- : in Table 8 of Section 4.9 . 

21. Find the standard matrix for the given matrix operator. 

( a ) T: R 2 — »projects a vector orthogonally onto the x-axis and then reflects that vector about the y-axis. 

(b) T.R 1 —+ R 1 reflects a vector about the line y = x and then reflects that vector about the x-axis. 

(c) T.R* —* R" dilates a vector by a factor of 3, then reflects that vector about the line y = x, and then 
projects that vector orthogonally onto they-axis. 

Answer: 

-1 0" 

0 0 


(a) 



(b) 

0 1 


-1 0 

(c) 

0 0 " 


3 0 


22. Find the standard matrix for the given matrix operator. 


( a ) T.B? —* R~' reflects a vector about the xz-plane and then contracts that vector by a factor of y 

(b) T: R —► R Jl projects a vector orthogonally onto the xz-plane and then projects that vector orthogonally 
onto the xy-plane. 

(c) X: R —♦ R^ reflects a vector about the xy-plane, then reflects that vector about the xz-plane, and then 
reflects that vector about the yz-plane. 


23. Let x A . R? —► R~' be multiplication by 


A = 


— 13 0 

2 1 2 
4 5-3 


and let e 2 , and e3 be the standard basis vectors for R-'. Find the following vectors by inspection. 


(a) T A (ei),T A (e 2 ),™dT A (e 3 ) 

(b) TjiOi + e 2 +e 3 ) 

(c) T A ( 7 ej) 


Answer: 

(a) T A M = ( - 1. 2. 4), T a (b 2 ) = (3, 1. 5), T A (e 3 ) = (0, 2, - 3) 

(b) T a (bi + e 2 + e 3 ) = (2, 5, 6) 

(c) T a (1b 3 ) = (0, 14, -21) 

24. Determine whether multiplication by A is a one-to-one matrix transformation. 


(a) 

1 -1 


A = 

2 0 



i 

i 

CO 

_ 1 


( Va = 

1 2 

3 


-1 0 

-4 

(c) 

1 2 

f 

A _ 

0 1 

1 

Si — 

1 1 

0 


1 0 - 

-1 


(a) Is a composition of one-to-one matrix transformations one-to-one? Justify your conclusion. 

(b) Can the composition of a one-to-one matrix transformation and a matrix transformation that is not 
one-to-one be one-to-one? Account for both possible orders of composition and justify your conclusion. 


Answer: 



(a) Yes 

(b) Yes 

26. Show that T(x, y) = (0, 0) defines a matrix operator on p} but T(x, y) = { 1,1) does not. 

(a) Prove: If T\R n —► R m is a matrix transformation, then T(0) = 0; that is, T maps the zero vector in R n 
into the zero vector in R m . 

(b) The converse of this is not true. Find an example of a function that satisfies T(0) = 0 but is not a matrix 
transformation. 

Answer: 

(b) T(x i, X2) = (xj + X2, * 1 * 2 ) 

28. Prove: An nxn matrix A is invertible if and only if the linear system Ax = w has exactly one solution for 
every vector w in R n for which the system is consistent. 

29. Let A be an n x n matrix such that det(A) = 0, and let T: R n — ♦ R n be multiplication by A. 

(a) What can you say about the range of the matrix T? Give an example that illustrates your conclusion. 

(b) What can you say about the number of vectors that T maps into 0 ? 

Answer: 

(a) The range of T is a proper subset of R 

(b) T must map infinitely many vectors to 0. 

30. Prove: If the matrix transformation Tj(.R n —► R n is one-to-one, then A is invertible. 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) If T: R n —► R m and T(0) = 0, then T is a matrix transformation. 

Answer: 

False 

(b) If T\R n —¥ R m and T(c\x + C 2 y) =c\T(x) + C 2 T(y) for all scalars c\ and ^2 and all vectors x and y in R* 
, then T is a matrix transformation. 

Answer: 

True 

(c) If T.R”^R m is a one-to-one matrix transformation, then there are no distinct vectors x and y for which 
T(x —y) = 0 . 

Answer: 


True 


(d) If T. R n —► R m is a matrix transformation and m>n, then T is one-to-one. 


Answer: 

False 

(e) If T. R n —► R m is a matrix transformation and m — then T is one-to-one. 
Answer: 

False 

(f) If T. R” —► R m is a matrix transformation and m<n, then T is one-to-one. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



4.11 Geometry of Matrix Operators on r 2 

In this optional section we will discuss matrix operators on p z in a little more depth. The ideas that we will develop here 
have important applications to computer graphics. 


Transformations of Regions 

In Section 4.9 we focused on the effect that a matrix operator has on individual vectors in p} and p~'. However, it is also 
important to understand how such operators affect the shapes of regions. For example, Figure 4.11.1 shows a famous 
picture of Albert Einstein and three computer-generated modifications of that image that result from matrix operators on 
p}. The original picture was scanned and then digitized to decompose it into a rectangular array of pixels. The pixels 
were then transformed as follows: 

The program MATLAB was used to assign coordinates and a gray level to each pixel. 

The coordinates of the pixels were transformed by matrix multiplication. 

The pixels were then assigned their original gray levels to produce the transformed picture. 



Figure 4.11.1 


The overall effect of a matrix operator on p} can often be ascertained by graphing the images of the vertices 
(0, 0), (1, 0), (0, 1), and (1, 1) of the unit square (Figure 4.11.2). Table 1 shows the effect that some of the matrix 
operators studied in Section 4.9 have on the unit square. For clarity, we have shaded a portion of the original square and 
its corresponding image. 
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Table 1 


Operator 


Standard Matrix 


Effect on the Unit Square 


Reflection about 
the v-axis 




f \ y 

(l.n (-1.D 


JC 


I - ^ 


( 1 . 1 ) 


Reflection about 
the x-axis 


[:.:] 


X 

-► 



(1,-D 



Reflection about 
the line y - x 


Counterclockwise 
rotation through 
an angle 0 


[ cos 9 -sin 9 ~| 
sin 9 cos Oj 


(l.l) 


(cost)- sin0, sin 0 + cos 0 ) 

\ y \ 


AIL 


Compression in the 
.v-direction by a 
factor of k 

(0 <k< 1) 


[::] 


( 1 , 1 ) 


(*, 1) 


Expansion in the 
^-direction by a 
factor of k 

(k> 1) 


[::] 


0 , 1 ) 


{k , 1) 


Shear in the 
^-direction with 
factor k > 0 


[:;] 


r. 


i y 


i) 


X 

-► 


{X + ky , v) 


Shear in the 


r, *i 


(l.D 


t*+*>•. y) 

































EXAMPLE 1 Transforming with Diagonal Matrices 


Suppose that the xy-plane first is compressed or expanded by a factor of in the x-direction and then is 
compressed or expanded by a factor of ^ in the y-direction. Find a single matrix operator that performs 
both operations. 


The standard matrices for the two operations are 


>1 o ' 


'1 0 ' 

.0 1 . 


_° k 2 _ 


x- c ompre s sion (exp ansion) y - c ompre s sion (exp ansion) 

Thus, the standard matrix for the composition of the x-operation followed by the y-operation is 

A = 


'1 0 ' 

'*! O' 


'*1 0 ' 

0 k 2 

_° 1. 


0 k 2 


( 1 ) 


This shows that multiplication by a diagonal 2x2 matrix compresses or expands the plane in the 
x-direction and also in the y-direction. In the special case where and k2 are the same, say k\ = £2 = k. 
Formula 1 simplifies to 

O' 


A = 


0 k 


which is a contraction or a dilation (Table 7 of Section 4.9 ). 


EXAMPLE 2 Finding Matrix Operators 

Find the standard matrix for the operator on that first shears by a factor of 2 in the x-direction and 
then reflects the result about the line y = x. Sketch the image of the unit square under this operator. 
Find the standard matrix for the operator on that first reflects about y = x and then shears by a 
factor of 2 in the x-direction. Sketch the image of the unit square under this operator. 

Confirm that the shear and the reflection in parts (a) and (b) do not commute. 


Solution 


The standard matrix for the shear is 


and for the reflection is 



2 

1 



1 

0 


Thus, the standard matrix for the shear followed by the reflection is 


A 2 A 1 = 


'0 f 

'1 2' 


1- 

o 

1- 

o 

_° 1. 


_1 2_ 
























(b) The standard matrix for the reflection followed by the shear is 

a ' a >=\i i 

The computations in Solutions ( a ) and ( b ) show that A\A2 * -^ 2 ^ 1 ’ so the standard matrices, and 
hence the operators, do not commute. The same conclusion follows from Figures 4.11.3 and 4.11.4, 
since the two operators produce different images of the unit square. 
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Geometry of One-to-One Matrix Operators 

We will now turn our attention to one-to-one matrix operators on g}, which are important because they map distinct 
points into distinct points. Recall from Theorem 4.10.4 (the Equivalence Theorem) that a matrix transformation T is 
one-to-one if and only if A can be expressed as a product of elementary matrices. Thus, we can analyze the effect of any 
one-to-one transformation Tj\ by first factoring the matrix A into a product of elementary matrices, say 

A = E\E2..-E r 

and then expressing T as the composition 

TA=TBiS- 2r .£r = TBi°TE2 0 — °TBr ( 2 ) 


The following theorem explains the geometric effect of matrix operators corresponding to elementary matrices. 























THEOREM 4.11.1 


If E is an elementary matrtix, then Tg.B? —► F? is one °f the following: 

(a) A shear along a coordinate axis. 

(b) A reflection about y = x. 

(c) A compression along a coordinate axis. 

(d) An expansion along a coordinate axis. 

(e) A reflection about a coordinate axis. 

(f) A compression or expansion along a coordinate axis followed by a reflection about a coordinate axis. 


Because a 2 x 2 elementary matrix results from performing a single elementary row operation on the 2x2 
identity matrix, such a matrix must have one of the following forms (verify): 


-1 

o 



L k 1_ 

? 

_° i_ 


01] Ht 0 
i oj’ [o 1 


1 0 
0 k 


The first two matrices represent shears along coordinate axes, and the third represents a reflection about y = x. If k > 0? 
the last two matrices represent compressions or expansions along coordinate axes, depending on whether 0 < k < 1 or 
k > 1 • If k < Q ? and if we express k in the form k = —k where > 0, then the last two matrices can be written as 


'k O' 


-*i 

O' 


'-1 o ' 

■*1 o ' 

.o r 


0 

1 _ 


0 1 _ 

.0 1 


'1 o' 


'1 0 


'1 O' 

'1 0 ' 

_0 k_ 


0 -*i 


.0 -1 

0 *1 


Since > 0, the product in 3 represents a compression or expansion along the x-axis followed by a reflection about the 
y- axis, and 4 represents a compression or expansion along the j-axis followed by a reflection about the x-axis. In the 
case where k= — 1, transformations 3 and 4 are simply reflections about the y-axis and x-axis, respectively. 


Since every invertible matrix is a product of elementary matrices, the following result follows from Theorem 4.11.1 and 
Formula 2. 


THEOREM 4.11.2 

If 7^ R* is multiplication by an invertible matrix A, then the geometric effect of T is the same as an 
appropriate succession of shears, compressions, expansions, and reflections. 


EXAMPLE 3 Analyzing the Geometric Effect of a Matrix Operator 


Assuming that k\ and kj are positive, express the diagonal matrix 

























A = 


k i 0 

0 k 2 


as a product of elementary matrices, and describe the geometric effect of multiplication by A in terms of 
compressions and expansions. 


From Example 1 we have 

A = 


which shows that multiplication by A has the geometric effect of compressing or expanding by a factor of 
k\ in the x-direction and then compressing or expanding by a factor of k 2 in the j-direction. 


*1 

0 " 


"l 

o' 

■*1 O' 

0 

h 


0 

k 2 _ 

_0 1 


EXAMPLE 4 Analyzing the Geometric Effect of a Matrix Operator 


Express 


as a product of elementary matrices, and then describe the geometric effect of multiplication by A in terms 
of shears, compressions, expansions, and reflections. 


olutior A can be reduced to / as follows: 


"1 2' 


"1 2 ' 


"1 2' 


1 

o 

_3 4 


O 

1 

DO 


l 

O 


-1 

o 


T t t 


Add—3 times 
the first row 
to the second. 


Multiply the 
second row 



Add “2 times 
the second row 
to the first. 


The three successive row operations can be performed by multiplying A on the left successively by 

1 0" 


£i = 


1 

-3 


S 2 = 


0 -- 

2 


B 2 = 


1 -2 

0 1 


Inverting these matrices and using Formula 4 of Section 1.5 yields 


A = 


- 1 

o 

- 1 

o 

CM 

i_ 

-1 

CO 

_0 —2_ 

-1 

O 


Reading from right to left and noting that 


"1 O' 


1 

o 

-1 

o 

1 o 

[ 

CO 


o 

1_ 

CM 

O 


it follows that the effect of multiplying by A is equivalent to 
shearing by a factor of 2 in the x-direction, 
then expanding by a factor of 2 in the j-direction, 
then reflecting about the x-axis, 
then shearing by a factor of 3 in the y-direction. 









































Images of Lines Under Matrix Operators 


Many images in computer graphics are constructed by connecting points with line segments. The following theorem, 
some of whose parts are proved in the exercises, is helpful for understanding how matrix operators transform such 
figures. 


THEOREM 4.11.3 

If T. F? —► i s multiplication by an invertible matrix, then: 

(a) The image of a straight line is a straight line. 

(b) The image of a straight line through the origin is a straight line through the origin. 

(c) The images of parallel straight lines are parallel straight lines. 

(d) The image of the line segment joining points P and Q is the line segment joining the images of P and Q. 

(e) The images of three points lie on a line if and only if the points themselves lie on a line. 


Note that it follows from Theorem 4.11.3 that if A is 
an invertible 2x2 matrix, then multiplication by A 
maps triangles into triangles and parallelograms into 
parallelograms. 


EXAMPLES Image of a Square 

Sketch the image of the square with vertices (0,0),(1, 1), and (0, 1) under multiplication by 


Solution Since 


"-1 2 ' 

‘O' 


"O' 


"-1 2 

T 


'-r 

2 -1. 

_0_ 


_0_ 

7 

2 - 1 . 

_o_ 


2 _ 


'-1 2 ' 

'o' 


2 ' 


'-1 2 ' 

V 


T 

2 - 1 . 

_1_ 


-1 

7 

2 “ I . 

_1_ 


_1_ 


the image of the square is a parallelogram with vertices (0, 0), ( — 1, 2), (2, — 1), and (1,1) (Figure 
4.11.5). 
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Figure 4.11.5 


EXAMPLE 6 ImageofaLine 

According to Theorem 4.11.3, the invertible matrix 

A = 


3 1 
2 1 


maps the line y = 2x + 1 into another line. Find its equation. 

Let (x, y) be a point on the line y — 2x + 1, and let (x ' ,y 1 ) be its image under 
multiplication by A. Then 


V ' 


'3 r 

~ X " 


~ X " 


'3 r 

-1 

'x ' ' 


1 -r 

V " 

y' 


2 i_ 

y 

and 

7 


2 1 


7 ' 


-2 3 _ 

7' 


so 

x = x ' — y ' 

y = —2x ' + 3y ' 

Substituting in y = 2x + 1 yields 

' 4= 3^ ' = 2 ' — 7 ' j 4= 1 or equivalently y ' = ^x 1 + 

Thus (x 1 ,7 1 ) satisfies 


r = ? +5 


which is the equation we want. 


























Concept Review 

Effect of a matrix operator on the unit square 
Geometry of one-to-one matrix operators 
Images of lines under matrix operators 

Skills 

Find standard matrices for geometric transformations of 
Describe the geometric effect of an invertible matrix operator. 
Find the image of the unit square under a matrix operator. 
Find the image of a line under a matrix operator. 


Exercise Set 4.11 

1. Find the standard matrix for the operator T.R? —» R 1 that maps a point (*, y ) into 

(a) its reflection about the line y = — x. 

(b) its reflection through the origin. 

(c) its orthogonal projection on the x-axis. 

(d) its orthogonal projection on they-axis. 


Answer: 


(a) 


(b) 


(c) 


(d) 


0 -1 

-1 0 

-1 0 

0 -1 

1 0 

0 0 
0 0 
0 1 


2. For each part of Exercise 1, use the matrix you have obtained to compute T{2, 1). Check your answers geometrically 
by plotting the points (2, 1) and T{2, 1). 

3. Find the standard matrix for the operator T.R? —» R~' that maps a point (*, y r z) into 

(a) its reflection through the xy- plane. 

(b) its reflection through the .xr-plane. 

(c) its reflection through the yz-plane. 


Answer: 


(a) 


1 0 
0 1 
0 0 









(b) 

"1 


0 

0 


0 

- 

-1 

0 


_0 


0 

1 

(c) 

- 

■1 

0 

0 



0 

1 

0 



0 

0 

1 


4. For each part of Exercise 3, use the matrix you have obtained to compute T( 1, 1, 1). Check your answers 
geometrically by plotting the points ( 1 , 1 , 1 ) andT(l, 1 , 1 ). 

5. Find the standard matrix for the operator T\F? B? that 

(a) rotates each vector 90° counterclockwise about the z-axis (looking along the positive z-axis toward the origin). 

(b) rotates each vector 90° counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 

(c) rotates each vector 90° counterclockwise about they-axis (looking along the positive y-axis toward the origin). 


Answer: 



(a) [0 

— 

■1 

0 

1 


0 

0 

0 


0 

1 

(b) n 

0 


0 

0 

0 

- 

-1 

0 

1 


0 

(c) 

0 

0 

1 


0 

1 

0 

- 

•1 

0 

0 


6 . Sketch the image of the rectangle with vertices (0, 0), (1, 0), (1, 2), and (0, 2) under 

(a) a reflection about the x-axis. 

(b) a reflection about the y-axis. 

(c) a compression of factor k = in the y-direction. 

(d) an expansion of factor k=2 m the x-direction. 

(e) a shear of factor k = 3 in the x-direction. 

(f) a shear of factor k=2 m the y-direction. 

7. Sketch the image of the square with vertices (0,0), (1,0), (0, 1), and (1,1) under multiplication by 



Answer: 

Rectangle with vertices at (0, 0), (—3, 0), (0, 1), (—3, 1) 

8 . Find the matrix that rotates a point (*, y) about the origin 

(a) 45° 

(b) 90° 

(c) 180° 

(d) 270° 



(e) -30' 


9. Find the matrix that shears by 

(a) a factor of = 4 i n the j-direction. 

(b) a factor of = — 2 in the x-direction. 


Answer: 


(a) 

(b) 


1 0 
4 1 

1 -2 

0 1 


10. Find the matrix that compresses or expands by 

( a ) a factor of -j in they-direction. 

(b) a factor of 6 in the x-direction. 


11. In each part, describe the geometric effect of multiplication by A. 
"3 0" 

_0 1 
"1 0 
0 -5 


A = 
^A = 

(c) A = 


1 4 
0 1 


Answer: 


(a) Expansion by a factor of 3 in the x-direction 

(b) Expansion by a factor of 5 in the j-direction and reflection about the x-axis 

(c) Shearing by a factor of 4 in the x-direction 


12. In each part, express the matrix as a product of elementary matrices, and then describe the effect of multiplication by 
A in terms of compressions, expansions, reflections, and shears. 


(a) 


(b) 


(c) 


(d) 


A = 


A = 


A = 


A = 


0 

3 

4 
9 

-2 

0 

-3 

6 


13. In each part, find a single matrix that performs the indicated succession of operations. 

(a) Compresses by a factor of -1 in the x-direction, then expands by a factor of 5 in the y-direction. 


(b) Expands by a factor of 5 in the y-dircction, then shears by a factor of 2 in the y-dircction. 

(c) Reflects about y = x, then rotates through an angle of 180° about the origin. 


Answer: 



(a) 

(b) 

(c) 


0 5J 

1 0 " 

2 5 . 

0 -1 
-1 0 


14. In each part, find a single matrix that performs the indicated succession of operations. 

(a) Reflects about they-axis, then expands by a factor of 5 in the x-direction, and then reflects about y = x. 

(b) Rotates through 30° about the origin, then shears by a factor of _2 in the y-direction, and then expands by a 
factor of 3 in the y-direction. 


15. Use matrix inversion to show the following. 

(a) The inverse transformation for a reflection about y = x is a reflection about y = x. 

(b) The inverse transformation for a compression along an axis is an expansion along that axis. 

(c) The inverse transformation for a reflection about a coordinate axis is a reflection about that axis. 

(d) The inverse transformation for a shear along a coordinate axis is a shear along that axis. 


16. Find an equation of the image of the line y — — 4x \ 3 under multiplication by 



17. In parts (a) through (e), find an equation of the image 

(a) a shear of factor 3 in the x-direction. 

(b) a compression of factor in the y-direction. 


of the line y = 2x under 


(c) a reflection about y = x. 

(d) a reflection about the y-axis. 

(e) a rotation of 60° about the origin. 


Answer: 

(a) y = jx 

(b) y = * 

(c) y=h 

(d) y= -2x 

( e ) / 8 + 5 ^3 

y -~ 11 


18. Find the matrix for a shear in the x-direction that transforms the triangle with vertices (0, 0), (2, 1), and (3, 0) into 
a right triangle with the right angle at the origin. 


(a) Show that multiplication by 


A = 


3 1 
6 2 


maps each point in the plane onto the line y = 2x- 



(b) It follows from part (a) that the noncollinear points (1,0), (0, 1), ( — 1,0) are mapped onto a line. Does this 
violate part ( e ) of Theorem 4.11.3? 

Answer: 

(b) No 

20. Prove part {a) of Theorem 4.11.3. [Hint: A line in the plane has an equation of the form Ax 4- C = 0, where A 
and B are not both zero. Use the method of Example 6 to show that the image of this line under multiplication by the 
invertible matrix 

\a b 
[c d_ 

has the equation A' x ^ B ' y -\- C = 0, where 

A 1 = (dA — cB) / {ad — be) 

and 

B' =(-bA + aB)l(ad-bc) 

Then show that A 1 an d B 1 are not both zero to conclude that the image is a line.] 

21. Use the hint in Exercise 20 to prove parts ( b ) and (c) of Theorem 4.11.3. 

22. In each part of the accompanying figure, find the standard matrix for the operator described. 



23. In the shear in the xy-direction with factor k is the matrix transformation that moves each point (x 7 y t z ) P a rallel 
to the xy-plane to the new position -| kz, y I kz, z) • (See the accompanying figure.) 

(a) Find the standard matrix for the shear in the xy-direction with factor k. 

(b) How would you define the shear in the xz-direction with factor k and the shear in the yz-direction with factor kl 
Find the standard matrices for these matrix transformations. 



Figure Ex-23 




















Answer: 


(a) 


1 0 k 
0 1 k 
0 0 1 


(b) Shear in the xz-direction with 


factor k maps (x, y, z) to ( x + ky , y f z -f ky) • 


1 k 0 
0 1 0 
0 k 1 


Shear in the yz-direction with factor k maps (x, y, z) to (x r y + kx,z + kx)- 

True-False Exercises 


T 0 0 
k 1 0 
k 0 1 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The image of the unit square under a one-to-one matrix operator is a square. 

Answer: 

False 

(b) A 2 x 2 invertible matrix operator has the geometric effect of a succession of shears, compressions, expansions, and 
reflections. 

Answer: 

True 

(c) The image of a line under a one-to-one matrix operator is a line. 

Answer: 

True 

(d) Every reflection operator on is its own inverse. 

Answer: 

True 

^ The matrix 


~[: -i 


represents reflection about a line. 


Answer: 

False 

® The matrix 


1 -2 

2 1 


represents a shear. 


Answer: 











False 


(g> 


The matrix 


1 

0 


0 

3 


represents an expansion. 


Answer: 


True 
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4.12 Dynamical Systems and Markov Chains 

In this optional section we will show how matrix methods can be used to analyze the behavior of physical systems that 
evolve over time. The methods that we will study here have been applied to problems in business, ecology, 
demographics, sociology, and most of the physical sciences. 


Dynamical Systems 

A dynamical system is a finite set of variables whose values change with time. The value of a variable at a point in time 
is called the state of the variable at that time, and the vector formed from these states is called the state of the 
dynamical system at that time. Our primary objective in this section is to analyze how the state of a dynamical system 
changes with time. Let us begin with an example. 

EXAMPLE 1 Market Share as a Dynamical System 

Suppose that two competing television channels, channel 1 and channel 2, each have 50% of the viewer 
market at some initial point in time. Assume that over each one-year period channel 1 captures 10% of 
channel 2's share, and channel 2 captures 20% of channel Ts share (see Figure 4.12.1). What is each 
channel's market share after one year? 

10 % 

Channel « Channel 


20 % 

80% 90% 

Channel I loses 20% and 
holds 80%. 

Channel 2 loses 10% and 
holds 90%. 




Figure 4.12.1 


Let us begin by introducing the time-dependent variables 

x i (t) = fraction of the market held by channel 1 at time t 

X2 (0 = fraction of the market held by channel 2 at time t 


and the column vector 


*oo 


*i (0 
*2 (0 


<- Channel I s fraction of the market at time t in years 
«- Channel 2 s fraction of the market at time t in years 


The variables x\ (t) and *2(0 form a dynamical system whose state at time t is the vector . If we 
take t — 0 to be the starting point at which the two channels had 50% of the market, then the state of the 
system at that time is 


*( 0 ) = 


*1(0)" 


0 . 5 " 

*2(0) 


0.5 


«— Channel I s fraction of the market at time* = 0 
4 - Channel 2 s fraction of the market at time * = 0 


( 1 ) 


Now let us try to find the state of the system at time t=\ (one year later). Over the one-year period, 
channel 1 retains 80% of its initial 50%, and it gains 10% of channel 2's initial 50%. Thus, 











x l(l) = 0.8(0.5) =h 0.1 (0.5) = 0.45 


(2) 


Similarly, channel 2 gains 20% of channel l's initial 50%, and retains 90% of its initial 50%. Thus, 


x 2 (l) = 0.2(0.5) + 0.9(0.5) = 0.55 

Therefore, the state of the system at time t= \ is 

0 45 4— Channel I s fraction of the market at time i = 1 
0.55 4 — Channel 2 s fraction of the market at time / = 1 



( 3 ) 

(4) 


EXAMPLE 2 Evolution of Market Share over Five Years 

Track the market shares of channels 1 and 2 in Example 1 over a five-year period. 

To solve this problem suppose that we have already computed the market share of each 
channel at time t=k and we are interested in using the known values of x\ (k) and x 2 (£) to compute the 
market shares x\(k-h 1) and x 2 (£ + 1) one year later. The analysis is exactly the same as that used to 
obtain Equations 2 and 3. Over the one-year period, channel 1 retains 80% of its starting fraction x \ (k) 
and gains 10% of channel 2's starting fraction X2 (k) . Thus, 

xi(k+ 1) = (0.8)xi(*) + (0.1)x 2 (*) (5) 

Similarly, channel 2 gains 20% of channel l’s starting fraction x \ (k) and retains 90% of its own starting 
fraction x 2 (£). Thus, 


x 2 (*+ 1) = (0.2)xi(*) + (0.9 )x 2 (*) 

Equations 5 and 6 can be expressed in matrix form as 


*1 (lt+ 1) 


'0.8 

0 r 

*l(*) 

X2(£+ 1) 


0.2 

0.9_ 

*2(*) 


( 6 ) 


( 7 ) 

which provides a way of using matrix multiplication to compute the state of the system at time t = k+ 1 
from the state at time t — £. For example, using 1 and 7 we obtain 

x(l) = 

which agrees with 4. Similarly, 

x(2) = 


'0.8 

0 r 

x(0) = 

'0.8 

or 

0.5' 


0.45' 

0.2 

0 9. 

0.2 

°9. 

_0.5_ 


_0.55_ 


'0.8 

or 

x(l) = 

'0.8 

or 

'0.45' 


0.415' 

02 

°9. 

0.2 

0.9_ 

_0.55_ 


_0.585_ 


We can now continue this process, using Formula 7 to compute x(3) from x(2), then x(4) from x(3), 
and so on. This yields (verify) 


x(3) = 


0.3905 

0.6095 


x(4) = 


0.37335 

0.62665 


x(5) = 


0.361345 

0.638655 


( 8 ) 


Thus, after five years, channel 1 will hold about 36% of the market and channel 2 will hold about 64%of 
the market. 


































If desired, we can continue the market analysis in the last example beyond the five-year period and explore what 
happens to the market share over the long term. We did so, using a computer, and obtained the following state vectors 
(rounded to six decimal places): 


x(10) « 


0.338041 

0.661959 


x(20)« 


0.333466 

0.666534 


x(40) « 


0.333333 

0.666667 


(9) 


All subsequent state vectors, when rounded to six decimal places, are the same as x(40), so we see that the market 
shares eventually stabilize with channel 1 holding about one-third of the market and channel 2 holding about 
two-thirds. Later in this section, we will explain why this stabilization occurs. 


Markov Chains 

In many dynamical systems the states of the variables are not known with certainty but can be expressed as 
probabilities; such dynamical systems are called stochastic processes (from the Greek word stokastikos , meaning 
“proceeding by guesswork”). A detailed study of stochastic processes requires a precise definition of the term 
probability , which is outside the scope of this course. However, the following interpretation will suffice for our present 
purposes: 


Stated informally ; the probability that an experiment or observation will have a certain outcome is 
approximately the fraction of the time that the outcome would occur if the experiment were to be repeated many 
times under constant conditions—the greater the number of repetitions, the more accurately the probability 
describes the fraction of occurrences. 


For example, when we say that the probability of tossing heads with a fair coin is we mean that if the coin were 


tossed many times under constant conditions, then we would expect about half of the outcomes to be heads. 
Probabilities are often expressed as decimals or percentages. Thus, the probability of tossing heads with a fair coin can 
also be expressed as 0.5 or 50%. 


If an experiment or observation has n possible outcomes, then the probabilities of those outcomes must be nonnegative 
fractions whose sum is 1. The probabilities are nonnegative because each describes the fraction of occurrences of an 
outcome over the long term, and the sum is 1 because they account for all possible outcomes. For example, if a box 
containing 10 balls has one red ball, three green balls, and six yellow balls, and if a ball is drawn at random from the 
box, then the probabilities of the various outcomes are 

p \ = prob(red) = 1/10 = 0.1 
P 2 = prob(green) = 3 /10 = 0.3 
P 3 = prob(yellow) = 6/10 = 0.6 
Each probability is a nonnegative fraction and 

Pi +<P2 + P3 = 0.1 + 0.3 + 0.6 = 1 

In a stochastic process with n possible states, the state vector at each time t has the form 








Probability that the system is in state 1 
Probability that the system is in state 2 

Prob ability that the system is in state n 

The entries in this vector must add up to 1 since they account for all n possibilities. In general, a vector with 
nonnegative entries that add up to 1 is called a probability vector. 

EXAMPLE 3 Example 1 Revisited from the Probability Viewpoint 


*00 = 


*i(0 

*200 

*m00 


Observe that the state vectors in Example 1 and Example 2 are all probability vectors. This is to be 
expected since the entries in each state vector are the fractional market shares of the channels, and together 
they account for the entire market. In practice, it is preferable to interpret the entries in the state vectors as 
probabilities rather than exact market fractions, since market information is usually obtained by statistical 
sampling procedures with intrinsic uncertainties. Thus, for example, the state vector 


* 0 ) = 


*i(l) 


'0.45' 

*2d) 


_0.55_ 


which we interpreted in Example 1 to mean that channel 1 has 45% of the market and channel 2 has 55%, 
can also be interpreted to mean that an individual picked at random from the market will be a channel 1 
viewer with probability 0.45 and a channel 2 viewer with probability 0.55. 


A square matrix, each of whose columns is a probability vector, is called a stochastic matrix. Such matrices commonly 
occur in formulas that relate successive states of a stochastic process. For example, the state vectors x(£ +1) and r(£) 
in 7 are related by an equation of the form x (k 4 - 1 ) = Px {k) in which 


P = 


0.8 0.1 

0.2 0.9 


( 10 ) 


is a stochastic matrix. It should not be surprising that the column vectors of P are probability vectors, since the entries 
in each column provide a breakdown of what happens to each channel's market share over the year—the entries in 
column 1 convey that each year channel 1 retains 80% of its market share and loses 20 %; and the entries in column 2 
convey that each year channel 2 retains 90% of its market share and loses 10%. The entries in 10 can also be viewed as 
probabilities: 

p\\ = 0.8 = probability that a channel 1 viewer remains a channel 1 viewer 

P2\ = 0.2 = probability that a channel 1 viewer becomes a channel 2 viewer 

Pl 2 =0.1= probability that a channel 2 viewer becomes a channel 1 viewer 

£>22 = 0.9 = probability that a channel 2 viewer remains a channel 2 viewer 

Example 1 is a special case of a large class of stochastic processes, called Markov chains. 











Andrei Andreyevich Markov (1856-1922) 

Markov chains are named in honor of the Russian mathematician A. A. Markov, a lover of 
poetry, who used them to analyze the alternation of vowels and consonants in the poem Eugene Onegin by 
Pushkin. Markov believed that the only applications of his chains were to the analysis of literary works, so he 
would be astonished to learn that his discovery is used today in the social sciences, quantum theory, and 
genetics! 

[Image: wikipedia ] 


DEFINITION 1 

A Markov chain is a dynamical system whose state vectors at a succession of time intervals are probability 
vectors and for which the state vectors at successive time intervals are related by an equation of the form 

x(*+l)=Ac(*) 

in which P = [Pij] is a stochastic matrix and Pij is the probability that the system will be in state i at time 
t = k + 1 if it is in state j at time t = £. The matrix P is called the transition matrix for the system. 


J 


Note that in this definition the row index i corresponds to the later state and the column index j to the earlier 
state (Figure 4.12.2). 


Stale at time t = k 

i 

State at lime 
\ t = k+\ 


The entry is the probability 
that the system is in state i at 
time / = *+ 1 if it is in state j 
at timer = k. 



Figure 4.12.2 





EXAMPLE 4 Wildlife Migration as a Markov Chain 


Suppose that a tagged lion can migrate over three adjacent game reserves in search of food, reserve 1, 
reserve 2, and reserve 3. Based on data about the food resources, researchers conclude that the monthly 
migration pattern of the lion can be modeled by a Markov chain with transition matrix 

Reserve at time t=k 

1 2 3 


P = 


0.5 

0.4 

0.6 

0.2 

0.2 

0.3 

0.3 

0.4 

0.1 


1 

2 

3 


Reserve at time t = k + 1 


(see Figure 4.12.3). That is, 

p 11 =0.5 = probability that the lion will stay in reserve 1 when it is in reserve 1 

p \2 = 0.4 = probability that the lion will move from reserve 2 to reserve 1 

P\2 = 0.6 = probability that the lion will move from reserve 3 to reserve 1 

P 2 \ = 0.2 = probability that the lion will move from reserve 1 to reserve 2 

P22 = 0.2 = probability that the lion will stay in reserve 2 when it is in reserve 2 

P23 = 0.3 = probability that the lion will move from reserve 3 to reserve 2 

P 2 \ = 0.3 = probability that the lion will move from reserve 1 to reserve 3 

P 22 = 0.4 = probability that the lion will move from reserve 2 to reserve 3 

p^^ =0.1= probability that the lion will stay in reserve 3 when it is in reserve 3 

Assuming that t is in months and the lion is released in reserve 2 at time £ = 0, track its probable 
locations over a six-month period. 

0.5 



Let x\ (k), X2(k), and * 3 (£) be the probabilities that the lion is in reserve 1, 2, or 3, 
respectively, at time t = k , and let 

*i(*) 
x{k) = *2(k) 
x 3 (k) 

be the state vector at that time. Since we know with certainty that the lion is in reserve 2 at time i = 0, the 
initial state vector is 

"O' 

1 
0 


x(0) = 









We leave it for you to show that the state vectors over a six-month period are 



'0.400' 


"0.520' 


"0.500' 

x( 1) =Px{ 0) = 

0.200 

,x(2)= J Px(l) = 

0.240 

,x(3)=Px(2) = 

0.224 


0.400 


0.240 


0.216 _ 


'0.505' 


'0.504' 


'0.504' 

x(4) =Px{ 3)« 

0.228 

,x(5)=.Px(4)« 

0.227 

, x(6) =Px(5) sa 

0.227 


0.267 


0.269 


0.269 


As in Example 2, the state vectors here seem to stabilize over time with a probability of approximately 
0.504 that the lion is in reserve 1, a probability of approximately 0.227 that it is in reserve 2, and a 
probability of approximately 0.269 that it is in reserve 3. 


Markov Chains in Terms of Powers of the Transition Matrix 

In a Markov chain with an initial state of x(0) , the successive state vectors are 

x(l)=ftc(0) f x(2)=ftc(l) f x(3)=A:(2) f x(4)=ftc(3) f .„ 

For brevity, it is common to denote x(£) by x^, which allows us to write the successive state vectors more briefly 

= Pxq, X2 = Px\, X 2 = Px 2 , X4 = Px^,... 


Note that Formula 12 makes it possible to compute 
the state vector x^ without first computing the 
earlier state vectors as required in Formula 11. 

Alternatively, these state vectors can be expressed in terms of the initial state vector xq as 

x\=Pxq, X2 = p(Pxq^ = P 2 xq, X2=p(p 2 xq''i = P 3 xq, X4 = p(p 3 xq'j = P 4 xq, ... 
from which it follows that 

x k = P k x o 


EXAMPLE 5 Finding a State Vector Directly from xo 


Use Formula 12 to find the state vector x(3) in Example 2. 


From 1 and 7, the initial state vector and transition matrix are 


xq =x 



0.5 

0.5 


We leave it for you to calculate p-' and show that 


and 



0.1 

0.9 


n3 

"0.562 

0.219' 

'0.5' 


'0.3905' 

X 3 =P xq = 

0.438 

0.781 

0.5 

— 

0.6095 


which agrees with the result in 8. 
























Long-Term Behavior of a Markov Chain 


We have seen two examples of Markov chains in which the state vectors seem to stabilize after a period of time. Thus, 
it is reasonable to ask whether all Markov chains have this property. The following example shows that this is not the 
case. 


EXAMPLE 6 A Markov Chain That Does Not Stabilize 


The matrix 



1 

0 


is stochastic and hence can be regarded as the transition matrix for a Markov chain. A simple calculation 
shows that p 2 = /, from which it follows that 


I = p 2 =p 4 = p 6 = ^ andF = F 2 = p 5 =p 7 = ___ 

Thus, the successive states in the Markov chain with initial vector xq are 

XQ, Px 0 , XQ, Px 0 , XQ, ... 


which oscillate between xq and Pxq. Thus, the Markov chain does not stabilize unless both components 
of xq are (verify). 


A precise definition of what it means for a sequence of numbers or vectors to stabilize is given in calculus; however, 
that level of precision will not be needed here. Stated informally, we will say that a sequence of vectors 

XI, X 2 , —, Xft, — 

approaches a limit q or that it converges to q if all entries in x^ can be made as close as we like to the corresponding 
entries in the vector q by taking k sufficiently large. We denote this by writing x^ —► q as —► oo- 

We saw in Example 6 that the state vectors of a Markov chain need not approach a limit in all cases. However, by 
imposing a mild condition on the transition matrix of a Markov chain, we can guarantee that the state vectors will 
approach a limit. 


n 


DEFINITION 2 

A stochastic matrix P is said to be regular if P or some positive power of P has all positive entries, and a 
Markov chain whose transition matrix is regular is said to be a regular Markov chain. 


J 


EXAMPLE 7 Regular Stochastic Matrices 




The transition matrices in Example 2 and Example 4 are regular because their entries are positive. The 
matrix 


is regular because 



1 

0 


P 2 


0.75 0.5 
0.25 0.5 


has positive entries. The matrix P in Example 6 is not regular because P and every positive power of P 
have some zero entries (verify). 


The following theorem, which we state without proof, is the fundamental result about the long-term behavior of 
Markov chains. 


THEOREM 4.12.1 

If P is the transition matrix for a regular Markov chain, then: 

(a) There is a unique probability vector q such that Pq = q. 

(b) For any initial probability vector xq, the sequence of state vectors 

XO, •Pxo. .PSco,... 

converges to q. 


The vector q in this theorem is called the steady-state vector of the Markov chain. It can be found by rewriting the 
equation in part (a) as 

(I-P) q = 0 

and then solving this equation for q subject to the requirement that q be a probability vector. Here are some examples. 


EXAMPLE 7 Example 1 and Example 2 Revisited 


The transition matrix for the Markov chain in Example 2 is 


P = 


0.8 

0.2 


0.1 

0.9 


Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q To find q we will solve the system (/ — P) q = 0, which we can write as 


0.2 

-or 

Vf 


" 0 " 

-0.2 

0.1_ 

■72 


_ 0 _ 


The general solution of this system is 

?l=0.5s, <72 = s 

(verify), which we can write in vector form as 














For q to be a probability vector, we must have 


1 = <?1 +<?2 = -|s 

2 

which implies that s = Substituting this value in 13 yields the steady-state vector 


9 = 


which is consistent with the numerical results obtained in 9. 


EXAMPLE 9 Example 4 Revisited 


The transition matrix for the Markov chain in Example 4 is 


P = 


0.5 0.4 
0.2 0.2 
0.3 0.4 


0.6 

0.3 

0.1 


Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q To find q we will solve the system (/ — P) q = 0, which we can write (using fractions) as 


1 

2 

I 

"5 

3_ 

10 


2 

"5 

4 

5 

2 

"5 


3 

“5 

'10 

_9_ 

10 


■?l' 


'o' 

<72 

= 

0 

<73 


0 


(14) 


(We have converted to fractions to avoid roundoff error in this illustrative example.) We leave it for you 
to confirm that the reduced row echelon form of the coefficient matrix is 


and that the general solution of 14 is 


1 0 

0 1 
0 0 


15 

8 

27 

32 

0 


15 


27 


?1 = - g - s >‘?2 = j 2 s ’43 = s 


( 15 ) 


32 

119 


For q to be a probability vector we must have q\ -F #2 + <73 = 1, from which it follows that s 
(verify). Substituting this value in 15 yields the steady-state vector 





















q = 


60 

119 

27 

119 

32 

119 


0.5042 

0.2269 

0.2689 


(verify), which is consistent with the results obtained in Example 4. 


Concept Review 

Dynamical system 
State of a variable 
State of a dynamical system 
Stochastic process 
Probability 
Probability vector 
Stochastic matrix 
Markov chain 
Transition matrix 
Regular stochastic matrix 
Regular Markov chain 
Steady-state vector 

Skills 

Determine whether a matrix is stochastic. 

Compute the state vectors from a transition matrix and an initial state. 
Determine whether a stochastic matrix is regular. 

Determine whether a Markov chain is regular. 

Find the steady-state vector for a regular transition matrix. 


Exercise Set 4.12 


In Exercises 1-2, determine whether A is a stochastic matrix. If A is not stochastic, then explain why not. 


L (a) 

(b) 


A = 
A = 


0.4 

0.6 

0.4 

0.3 


0.3 

0.7 

0.6 

0.7 











(C) 


A = 


(d) 


A = 


1 

2 

0 

1 

2 

I 

3 

I 

3 

I 

3 


i 

3 

I 

3 

I 

3 


1 

2 
2 
2 

1 


Answer: 


(a) Stochastic 

(b) Not stochastic 

(c) Stochastic 

(d) Not stochastic 


2 ' (a) ,[0 2 0.9 

0.8 0.1 

(b) ,[ 0.2 0.8 

0.9 0.1 


(c) 


A = 


12 

1 

2 

5_ 

12 


I 

9 

0 

8 

9 


I 

6 

5 

6 

0 


(d) 


A = 


0 


2 


I I 

3 2 

I I 

3 2 


In Exercises 3^1, use Formulas 11 and 12 to compute the state vector X 4 in two different ways. 


3. 


P = 


0.5 0.6 
0.5 0.4 


;*o = 


0.5 

0.5 


Answer: 


0.54545 

0.45455 


4. 


P = 


0.8 0.5 
0.2 0.5 


*0 = 


1 

0 


In Exercises 5-6, determine whether P is a regular stochastic matrix. 



p= 


5 - (a) 


(b) 


(c) 


P = 


P = 


1 

7 

6 

7 

0 

1 


1 

0 


Answer: 


(a) Regular 

(b) Not regular 

(c) Regular 

6 - (a) 

P 


3 1 

4 3 

1 2 

4 3 

In Exercises 7-10, verify that P is a regular stochastic matrix, and find the steady-state vector for the associated 
Markov chain. 


(b) 

P = 

(c) 

P = 



7. 


P = 


I 

4 

3 

4 


2 

3 

I 

3 


Answer: 


17 

_9_ 

17 


8 . 


P = 


0.2 0.6 
0.8 0.4 



9. 


P = 


1 

2 

I 

4 

I 

4 


1 

2 

1 

2 

0 


0 

1 

3 

2 
3 


Answer: 


A 

11 

4_ 

11 

A 

ll 


10 . 


p = 


1 

3 

0 

2 
3 


I 

4 

3 

4 

0 


2 

5 

2 

5 

1 

5 


11. Consider a Markov process with transition matrix 


State 1 State 2 

State 1 0.2 0.1 
State 2 0.8 0.9 


(a) What does the entry 0.2 represent? 

(b) What does the entry 0.1 represent? 

(c) If the system is in state 1 initially, what is the probability that it will be in state 2 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the 
next observation? 


Answer: 


(a) Probability that something in state 1 stays in state 1 

(b) Probability that something in state 2 moves to state 1 

(c) 0.8 

(d) 0.85 


12. Consider a Markov process with transition matrix 


State 1 State 2 


State 1 
State 2 


0 i 

> f 


(a) what does the entry y represent? 

(b) What does the entry 0 represent? 



(c) If the system is in state 1 initially, what is the probability that it will be in state 1 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the 
next observation? 

13. On a given day the air quality in a certain city is either good or bad. Records show that when the air quality is good 
on one day, then there is a 95% chance that it will be good the next day, and when the air quality is bad on one day, 
then there is a 45% chance that it will be bad the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the air quality is good today, what is the probability that it will be good two days from now? 

(c) If the air quality is bad today, what is the probability that it will be bad three days from now? 

(d) If there is a 20% chance that the air quality will be good today, what is the probability that it will be good 
tomorrow? 

Answer: 

(a) [0.95 0.55" 

0.05 0.45_ 

(b) 0.93 

(c) 0.142 

(d) 0.63 

14. In a laboratory experiment, a mouse can choose one of two food types each day, type I or type II. Records show that 
if the mouse chooses type I on a given day, then there is a 75% chance that it will choose type I the next day, and if 
it chooses type II on one day, then there is a 50% chance that it will choose type II the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the mouse chooses type I today, what is the probability that it will choose type I two days from now? 

(c) If the mouse chooses type II today, what is the probability that it will choose type II three days from now? 

(d) If there is a 10% chance that the mouse will choose type I today, what is the probability that it will choose type 
I tomorrow? 

15. Suppose that at some initial point in time 100,000 people live in a certain city and 25,000 people live in its suburbs. 
The Regional Planning Commission determines that each year 5% of the city population moves to the suburbs and 
3% of the suburban population moves to the city. 

(a) Assuming that the total population remains constant, make a table that shows the populations of the city and its 
suburbs over a five-year period (round to the nearest integer). 

(b) Over the long term, how will the population be distributed between the city and its suburbs? 

Answer: 





Suburbs 


78,125 


16. Suppose that two competing television stations, station 1 and station 2, each have 50% of the viewer market at some 
initial point in time. Assume that over each one-year period station 1 captures 5% of station 2’s market share and 
station 2 captures 10% of station l’s market share. 

(a) Make a table that shows the market share of each station over a five-year period. 

(b) Over the long term, how will the market share be distributed between the two stations? 

17. Suppose that a car rental agency has three locations, numbered 1, 2, and 3. A customer may rent a car from any of 
the three locations and return it to any of the three locations. Records show that cars are rented and returned in 
accordance with the following probabilities: 


Rented from Location 


1 


Returned to Location 2 


3 


12 3 


1 

10 

1 

5 

3 

5 

4 

3 

1 

5 

10 

5 

1 

1 

1 

10 

2 

5 


(a) Assuming that a car is rented from location 1, what is the probability that it will be at location 1 after two 
rentals? 

(b) Assuming that this dynamical system can be modeled as a Markov chain, find the steady-state vector. 

(c) If the rental agency owns 120 cars, how many parking spaces should it allocate at each location to be 
reasonably certain that it will have enough spaces for the cars over the long term? Explain your reasoning. 


Answer: 


(a) 

(b) 


23 
100 
' 46 
159 
22 
53 
47 
159 


(c) 35,50,35 


18. Physical traits are determined by the genes that an offspring receives from its parents. In the simplest case a trait in 
the offspring is determined by one pair of genes, one member of the pair inherited from the male parent and the 
other from the female parent. Typically, each gene in a pair can assume one of two forms, called alleles, denoted by 
A and a. This leads to three possible pairings: 

AA, Aa, aa 

called genotypes (the pairs Aa and aA determine the same trait and hence are not distinguished from one another). It 
is shown in the study of heredity that if a parent of known genotype is crossed with a random parent of unknown 
genotype, then the offspring will have the genotype probabilities given in the following table, which can be viewed 
as a transition matrix for a Markov process: 




Genotype of Parent 


AA 

Genotype of Offspring Aa 

aa 

Thus, for example, the offspring of a parent of genotype AA that is crossed at random with a parent of unknown 
genotype will have a 50% chance of being AA, a 50% chance of being Aa, and no chance of being aa. 

(a) Show that the transition matrix is regular. 

(b) Find the steady-state vector, and discuss its physical interpretation. 


AA Aa aa 


1 

2 

1 

4 

0 

1 

1 

1 




2 

2 

2 

0 

1 

4 

1 

2 


19. Fill in the missing entries of the stochastic matrix 



7 

* 

1 

10 


5 

* 

3 

* 


10 


1 

3 

3 

10 

5 

10 


and find its steady-state vector. 


Answer: 


’7 1 1 ' 


V 

10 10 5 


3 

1 3 1 


1 

5 10 2 

; q = 

3 

1 3 3 


1 

10 5 10 


3 


20. If P is an n x n stochastic matrix, and if M is a 1 x n matrix whose entries are all l's, then MP = _ 

21. If P is a regular stochastic matrix with steady-state vector q, what can you say about the sequence of products 

Pq, P 2 q, P 3 q,..., P k q,.„ 


as k —► oo- 


Answer: 

P*q = q for every positive integer k 

22- (a) If P is a regular n x n stochastic matrix with steady-state vector q, and if e \, e 2 ,.. e n are the standard unit 
vectors in column form, what can you say about the behavior of the sequence 

Pei, P\. P\ . 

as k —► do f° r ea °h i = 1, 2 

(b) What does this tell you about the behavior of the column vectors of as fc —► <xj? 

23. Prove that the product of two stochastic matrices is a stochastic matrix. [Hint: Write each column of the product as 
a linear combination of the columns of the first factor. 




24. Prove that if P is a stochastic matrix whose entries are all greater than or equal to p , then the entries of p 2 are 
greater than or equal to p. 

True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) [1 

3 

The vector 0 is a probability vector. 

2 

3 

Answer: 


True 


(b) 


The matrix 


0.2 T 
0.8 0 


is a regular stochastic matrix. 


Answer: 

True 

(c) The column vectors of a transition matrix are probability vectors. 

Answer: 

True 

(d) A steady-state vector for a Markov chain with transition matrix P is any solution of the linear system (1 — P) q = 0. 
Answer: 

False 

(e) The square of every regular stochastic matrix is stochastic. 

Answer: 

True 
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Supplementary Exercises 


1. Let Vbe the set of all ordered pairs of real numbers, and consider the following addition and scalar 
multiplication operations on u = (u\, U2, ^ 3 ) and v = (vj, V 2 , V 3 ): 

u f v = («! U 2 + V 2 , U 2 + V 3 ), £u= (£ 111 , 0, 0) 

(a) Compute u + v and fcu for u = (3, — 2, 4), v = (1, 5, — 2), and k = — 1- 

(b) In words, explain why V is closed under addition and scalar multiplication. 

(c) Since the addition operation on V is the standard addition operation on py, certain vector space axioms 
hold for V because they are known to hold for Which axioms in Definition 1 of Section 4.1 are 
they? 

(d) Show that Axioms 7, 8 , and 9 hold. 

(e) Show that Axiom 10 fails for the given operations. 

Answer: 

(a) u +v = (4, 3, 2), — u = ( — 3, 0, 0) 

(c) Axioms 1-5 

2. In each part, the solution space of the system is a subspace of /?-' and so must be a line through the origin, 
a plane through the origin, all of R-\ or the origin only. For each system, determine which is the case. If 
the subspace is a plane, find an equation for it, and if it is a line, find parametric equations. 


(a) 

0 

+ 

$ 

+ 

II 

0 


(b) 

2x — 3y + z = 

0 


6 x — 9y + 3z = 

0 


— Ax + 6 y — 2z = 

= 0 

(c) 

x — 2y -\-lz = 

0 


— 4x + Sy-b5z — 

= 0 


2x — 4y -F 3z = 

0 

(d) 

x + 4y + 8 z = 0 



2x 4 = 5y + 6z = 0 



3x+ y - 4z = 0 



3. For what values of s is the solution space of 

*1 + *2 + SX 2 = 0 
x\ +SX 2 + X 2 = 0 
sxq + X 2 4= X 2 = 0 

the origin only, a line through the origin, a plane through the origin, or all of/? 3 ? 


Answer: 


If s * 1, — 2, the solution space is the origin. If $ = 1 , the solution space is a plane through the origin. If 
s = — 2, the solution space is a line through the origin. 

4* (a) Express (4a, a — b,a + 2b) as a linear combination of (4, 1, 1) and (0, —1,2). 

(b) Express (3 a + b + 3 c, — a + 4b — c, 2a + b 4- 2c) as a linear combination of (3, — 1, 2) and 
(1,4,1). 

(c) Express (2a — b 4- 4c, 3 a — c, 4b + c ) as a linear combination of three nonzero vectors. 

5. Let W be the space spanned by f = sin x and g = cos x. 

(a) Show that for any value of 0 , f \ = sin(x + 0) and gi = cos(x 4- 9) are vectors in W. 

(b) Show that f j and gl form a basis for W. 

(a) Express v=(l,l)asa linear combination of vj = (1, — 1), V 2 = (3, 0), and V 3 = (2, 1) in two 
different ways. 

(b) Explain why this does not violate Theorem 4.4.1. 

7. Let A be an ^ x n matrix, and let vi, \’ 2 ,..., v M be linearly independent vectors in expressed as ^ x 1 
matrices. What must be true about A for Av\, Avj, -• •, A\ n to be linearly independent? 

Answer: 


A must be invertible 

8. Must a basis for P n contain a polynomial of degree k for each k = 0, 1,2,...,«? Justify your answer. 

9. For the purpose of this exercise, let us define a “checkerboard matrix” to be a square matrix A = [a„ ] 
such that 

( 1 if i + j is even 
0 if i + j is odd 

Find the rank and nullity of the following checkerboard matrices. 

(a) The 3x3 checkerboard matrix. 

(b) The 4x4 checkerboard matrix. 

(c) The nxn checkerboard matrix. 


Answer: 


( a ) Rank = 2, nullity = 1 

(b) Rank = 2, nullity = 2 

( c ) Rank = 2, nullity = « — 2 


10. For the purpose of this exercise, let us define an “X-matrix” to be a square matrix with an odd number of 
rows and columns that has O 's everywhere except on the two diagonals where it has l's. Find the rank and 
nullity of the following X-matrices. 


(a) 


1 0 1 
0 1 0 
1 0 1 



(b) Tl 0 0 0 f 

0 10 10 
0 0 10 0 
0 10 10 
1 0 0 0 1 

(c) theX-matrix of size (2« + 1) x (2n +1) 

11. In each part, show that the stated set of polynomials is a subspace of P n and find a basis for it. 

(a) All polynomials in P n such that p( — x) = p(x). 

(b) All polynomials in P n such that /?(0) = 0. 

Answer: 

( a ) | 1, x“, x^, ..., x~ " Js where 2m = « if « is even and 2m =n — 1 if« is odd. 

(b) |x, x 2 , x 3 , .... 

12. ( Calculus required) Show that the set of all polynomials in P n that have a horizontal tangent at * = 0 is a 
subspace of P n . Find a basis for this subspace. 

13* (a) Find a basis for the vector space of all 3 x 3 symmetric matrices. 

(b) Find a basis for the vector space of all 3 x 3 skew-symmetric matrices. 

Answer: 

(a) rri 0 0 ] [0 1 0 ] To 0 1 ] To 0 0 ] [0 0 0 ] [0 0 ol] 

^ 0 0 0 , 100 , 000 , 010 , 001 , 0 0 0 

[[o 0 oj [o 0 oj [l 0 oj L° 0 °J 1° 1 oj L° 0 ] JJ 

(b) [[ o i ol r o o l] To o ol] 

^-100, 000,0 oil 

[|_ 0 0 oj [-1 0 oj |_0 =1 oJJ 

14. Various advanced texts in linear algebra prove the following determinant criterion for rank: The rank of a 
matrix A is r if and only ifA has some rxr submatrix with a nonzero determinant, and all square 
submatrices of larger size have determinant zero. [Note: A submatrix of A is any matrix obtained by 
deleting rows or columns of A. The matrix A itself is also considered to be a submatrix of A.] In each part, 
use this criterion to find the rank of the matrix. 

(a) [1 2 0" 

_2 4 -1_ 

(b) f 1 2 3' 

2 4 6_ 

(e) 1 0 f 

2- 13 

3- 14 



(d)iff. 1-12 0' 

3 10 0 

-1 2 4 0 

15. Use the result in Exercise 14 above to find the possible ranks for matrices of the form 

0 0 0 0 0 

0 0 0 0 0 <326 

0 0 0 0 0 <336 

0 0 0 0 0 <346 

<351 <*52 <*53 <354 <*56 

Answer: 

Possible ranks are 2, 1, and 0. 

16. Prove: If S is a basis for a vector space y, then for any vectors u and y in l' and any scalar k, the following 
relationships hold. 

(a) (u + v)^=(u) 5 +(v ) 5 

(b) (^)5'= jt ( u )^ 
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Eigenvalues and 


Eigenvectors 
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Eigenvalues and Eigenvectors 
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Differential Equations 


INTRODUCTION 

In this chapter we will focus on classes of scalars and vectors known as “eigenvalues” and 
“eigenvectors,” terms derived from the German word eigen, meaning “own,” “peculiar 
to,” “characteristic,” or “individual.” The underlying idea first appeared in the study of 
rotational motion but was later used to classify various kinds of surfaces and to describe 
solutions of certain differential equations. In the early 1900s it was applied to matrices and 
matrix transformations, and today it has applications in such diverse fields as computer 
graphics, mechanical vibrations, heat flow, population dynamics, quantum mechanics, and 
economics to name just a few. 
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5.1 Eigenvalues and Eigenvectors 

In this section we will define the notions of “eigenvalue” and “eigenvector” and discuss some of their basic 
properties. 


Definition of Eigenvalue and Eigenvector 

We begin with the main definition in this section. 


DEFINITION 1 

If A is an ^ x n matrix, then a nonzero vector x in R* is called an eigenvector of A (or of the matrix 
operator T if Ax is a scalar multiple of x; that is, 

Ax = Ax 

for some scalar ,\- The scalar \ is called an eigenvalue of A (or of Tj\), and x is said to be an 
eigenvector corresponding to \. 


The requirement that an eigenvector be 
nonzero is imposed to avoid the unimportant 
case AO = AO? which holds for every A and \ 


In general, the image of a vector x under multiplication by a square matrix A differs from x in both magnitude 
and direction. However, in the special case where x is an eigenvector of A, multiplication by A leaves the 
direction unchanged. For example, in r} or R-' multiplication by A maps each eigenvector x of A (if any) 
along the same line through the origin as x. Depending on the sign and magnitude of the eigenvalue \ 
corresponding to x, the operation Ax = Ax compresses or stretches x by a factor of A, with a reversal of 
direction in the case where \ is negative (Figure 5.1.1). 



Figure 5.1.1 


◄ 


EXAMPLE 1 


Eigenvector of a 2 * 2 Matrix 


The vector x = 


is an eigenvector of 


A = 


3 0 

8 -1 


corresponding to the eigenvalue = 3, since 

Ax = 


"3 

O' 

T 


"3" 

_8 

- 1 _ 

_2_ 


_6_ 


= 3x 


Geometrically, multiplication by A has stretched the vector x by a factor of 3 (Figure 5.1.2). 



Computing Eigenvalues and Eigenvectors 

Our next objective is to obtain a general procedure for finding eigenvalues and eigenvectors of an n x n 
matrix A. We will begin with the problem of finding the eigenvalues of A. Note first that the equation 
Ax = Ax can be rewritten as Ax. = A/x- or equivalently, as 

(XI -A)x = 0 

For ,\ to be an eigenvalue of A this equation must have a nonzero solution for x. But it follows from parts (b) 
and (g) of Theorem 4.10.4 that this is so if and only if the coefficient matrix XI — A bas a zero determinant. 
Thus, we have the following result. 


THEOREM 5.1.1 

If A is an ^ x n matrix, then \ is an eigenvalue of A if and only if it satisfies the equation 

det(A/ — ^4) = 0 (1) 


This is called the characteristic equation of A. 















EXAMPLE 2 Finding Eigenvalues 


In Example 1 we observed that \ = 3 is an eigenvalue of the matrix 



but we did not explain how we found it. Use the characteristic equation to find all eigenvalues 
of this matrix. 


It follows from Formula 1 that the 
det(A/ — A) = 0, which we can write as 

A — 3 

-8 

from which we obtain 

(A-3)(A+1) = 0 (2) 

This shows that the eigenvalues of A are A = 3 and A = — 1 • Thus, in addition to the 
eigenvalue A = 3 noted in Example 1, we have discovered a second eigenvalue \ = — \. 


eigenvalues of A are the solutions of the equation 


0 

A -E 1 


= 0 


When the determinant det(A/ — A) that appears on the left side of 1 is expanded, the result is a polynomial 
p (A) of degree n that is called the characteristic polynomial of A. For example, it follows from 2 that the 
characteristic polynomial of the 2x2 matrix A in Example 2 is 

£>(A) = (A — 3)(A+ 1) = A 2 — 2A — 3 

which is a polynomial of degree 2. In general, the characteristic polynomial of an n x n matrix has the form 

£>(A) = A M + cjA” ^+... + c m 

in which the coefficient of A M is 1 (Exercise 17). Since a polynomial of degree n has at most n distinct roots, it 
follows that the equation 


A” + ciA” ^+... + c M = 0 


(3) 


has at most n distinct solutions and consequently that an nxn matrix has at most n distinct eigenvalues. Since 
some of these solutions may be complex numbers, it is possible for a matrix to have complex eigenvalues, 
even if that matrix itself has real entries. We will discuss this issue in more detail later, but for now we will 
focus on examples in which the eigenvalues are real numbers. 


EXAMPLE 3 Eigenvalues of a 3 x 3 Matrix 


Find the eigenvalues of 



1 0 
0 1 
-17 8 








= A 3 — 8A 2 + 17A — 4 


The characteristic polynomial of A is 


det(A/ — A) — det 


A 

0 

-4 


-1 0 
A -1 
17 A — 8 


The eigenvalues of A must therefore satisfy the cubic equation 


A 3 — 8A 2 4- 17A — 4 = 0 


( 4 ) 


To solve this equation, we will begin by searching for integer solutions. This task can be 
simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial 
equation with integer coefficients 

A w + ^ +... + c n = 0 

In applications involving large matrices 
it is often not feasible to compute the 
characteristic equation directly so other 
methods must be used to find 
eigenvalues. We will consider such 
methods in Chapter 9. 

must be divisors of the constant term, c n . Thus, the only possible integer solutions of 4 are the 
divisors of _4, that is, ± 1 , ± 2 ? ±4- Successively substituting these values in 4 shows that 
A = 4 is an integer solution. As a consequence, A — 4 must be a factor of the left side of 4. 
Dividing A - 4 into * 3 - 8A 2 + 17A - 4 shows that 4 can be rewritten as 

(A-4)(A 2 -4A+lJ = 0 

Thus, the remaining solutions of 4 satisfy the quadratic equation 

A 2 — 4A + 1 = 0 

which can be solved by the quadratic formula. Thus the eigenvalues of A are 

A = 4, A = 2 -I- \[i, and A = 2 — ^3 


EXAMPLE 4 Eigenvalues of an Upper Triangular Matrix 


Find the eigenvalues of the upper triangular 

matrix 




'a ii 

«12 

a l3 

<314 


0 

«22 

a 23 

a 24 

A = 

0 

0 

a 33 

a 34 


0 

0 

0 

a 44 






Recalling that the determinant of a triangular matrix is the product of the entries on 
the main diagonal (Theorem 2.1.2), we obtain 


det(A l-A) 



-<2H 

-a 12 

-<21 3 

-a 14 

0 

> 

1 

& 

to 

to 

-<2 23 

~ a 24 

0 

0 

A-<2 33 

34 

0 

0 

0 

A — <244 


= (A — a 11 ) (A — <z 22 ) (A — a 33 ) (A — < 244 ) 


Thus, the characteristic equation is 

(A — a 11 ) (A — < 222 ) (A — < 2 33 ) (A — 1344 ) = 0 

and the eigenvalues are 

A = a 11, A = «22. A = i2 33 , A = <244 

which are precisely the diagonal entries of A. 


The following general theorem should be evident from the computations in the preceding example. 


THEOREM 5.1.2 

If A is an ^ x n triangular matrix (upper triangular, lower triangular, or diagonal), then the eigenvalues 
of A are the entries on the main diagonal of A. 


EXAMPLE 5 Eigenvalues of a Lower Triangular Matrix 


By inspection, the eigenvalues of the lower triangular matrix 


A = 



5 


0 

2 

3 

-8 


0 

0 

1 

4 


are A 


1 

2 ’ 


A= j, and A = 


1 

4' 


Had Theorem 5.1.2 been available earlier, we 
could have anticipated the result obtained in 
Example 2. 






THEOREM 5.1.3 


If A is an ^ x n matrix, the following statements are equivalent. 

(a) \ is an eigenvalue of A. 

(b) The system of equations (A l — A)x = 0 has nontrivial solutions. 

(c) There is a nonzero vector x such that Ax = Ax 

(d) A is a solution of the characteristic equation det(A/ — A) = 0 


Finding Eigenvectors and Bases for Eigenspaces 

Now that we know how to find the eigenvalues of a matrix, we will consider the problem of finding the 
corresponding eigenvectors. Since the eigenvectors corresponding to an eigenvalue \ of a matrix^ are the 
nonzero vectors that satisfy the equation 

(XI-A)x = 0 

these eigenvectors are the nonzero vectors in the null space of the matrix XI _ J[. We call this null space the 
eigenspace of A corresponding to Stated another way, the eigenspace of A corresponding to the eigenvalue 

X is the solution space of the homogeneous system (AI — ^4)x = 0. 

Notice that x = 0 is i n every eigenspace even 
though it is not an eigenvector. Thus, it is the 
nonzero vectors in an eigenspace that are the 
eigenvectors. 


EXAMPLE 6 Bases for Eigenspaces 


Find bases for the eigenspaces of the matrix 


A = 


8 -1 


In Example 1 we found the characteristic equation of A to be 

(A — 3)(A+ 1) = 0 

from which we obtained the eigenvalues A = 3 an d A = — 1 • Thus, there are two eigenspaces 
of A, one corresponding to each of these eigenvalues. 


By definition. 




is an eigenvector of A corresponding to an eigenvalue \ if and only if x is a nontrivial solution 
of (A I — A)x = 0, that is, of 


If \ = 3, then this equation becomes 


A — 3 

0 

'*r 


'o' 

-8 

A+l_ 

_ x 2_ 


_0_ 


0 

O' 

-*r 


"0" 

8 

4_ 

*2 


_0_ 


whose general solution is 

(verify) or in matrix form, 


x l = 21> x 2 = t 


-xf 


\h] 


" 1" 

*2 


2 

= t 

2 



t 


1 


Thus, 

1 
2 
1 

is a basis for the eigenspace corresponding to A = 3- We leave it as an exercise for you to 
follow the pattern of these computations and show that 

O' 

1 

is a basis for the eigenspace corresponding to A = — 1 • 



Methods of linear algebra are used in the emerging field of computerized face 
recognition. Researchers are working with the idea that every human face in a racial group is a 
combination of a few dozen primary shapes. For example, by analyzing three-dimensional scans of 
many faces, researchers at Rockefeller University have produced both an average head shape in the 
























Caucasian group—dubbed the meanhead (top row left in the figure to the left)—and a set of 
standardized variations from that shape, called eigenheads (15 of which are shown in the picture). 
These are so named because they are eigenvectors of a certain matrix that stores digitized facial 
information. Face shapes are represented mathematically as linear combinations of the eigenheads. 
[Image: Courtesy Dr. Joseph Atick, Dr Norman Redlich, and Dr Paul Griffith ] 


EXAMPLE 7 Eigenvectors and Bases for Eigenspaces 


Find bases for the eigenspaces of 


A = 


0 0-2 

1 2 1 

1 0 3 


The characteristic equation of A is A^ — 5A^ I 8A — 4 = 0- or m factored form, 

(A — 1) (A — 2) = 0 (verify). Thus, the distinct eigenvalues of A are A = 1 and A = 2> so there 
are two eigenspaces of A. 


By definition, 


x = 


*1 

*2 

*3 


is an eigenvector of A corresponding to A if and only if x is a nontrivial solution of 
(A I — j4)x = 0, or in matrix form, 


A 

0 

2 

■*r 


'O' 

-1 

A— 2 

-1 

*2 

= 

0 

-1 

0 

A — 3 

*3 


0 


(5) 


In the case where A = 2- Formula 5 becomes 


2 0 2 ' 

"*r 


"0" 

-1 0 -1 

*2 

= 

0 

-1 0 -1 

*3 


0 


Solving this system using Gaussian elimination yields (verify) 

x 1 = -s, X2=t, xi=s 

Thus, the eigenvectors of A corresponding to A = 2 are the nonzero vectors of the form 





" — s' 


'o' 


'-f 


'o' 

X = 

t 

— 

0 

4- 

t 

= S 

0 

+ / 

1 


S 


s 


0 


1 


0 


Since 




























"-r 


'O' 

0 

and 

1 

1 


0 


are linearly independent (why?), these vectors form a basis for the eigenspace corresponding to 

A = 2- 

If,\ = I, then 5 becomes 


1 

0 

2' 



"0" 

-1 

-1 

-1 

x 2 

= 

0 

-1 

0 

-2 

x 3 


0 


Solving this system yields (verify) 

*1 = -2s, X 2 =s, X 2 = s 

Thus, the eigenvectors corresponding to \ = 1 are the nonzero vectors of the form 


—2s 

S 

= s 

-2' 

1 

so that 

-2' 

1 

S 


1 


1 


is a basis for the eigenspace corresponding to A = 1 • 


Powers of a Matrix 

Once the eigenvalues and eigenvectors of a matrix A are found, it is a simple matter to find the eigenvalues 
and eigenvectors of any positive integer power of A; for example, if A is an eigenvalue of A and x is a 
corresponding eigenvector, then 

A^x = A(Ax) = A(Ax) = A (Ax) = A (Ax) = A 2 x 

which shows that A 2 is an eigenvalue of A 1 and that x is a corresponding eigenvector. In general, we have the 
following result. 


THEOREM 5.1.4 

If & is a positive integer, A is an eigenvalue of a matrix A, and x is a corresponding eigenvector, then 
A* is an eigenvalue of and x is a corresponding eigenvector. 


EXAMPLE 8 Powers of a Matrix 


















In Example 7 we showed that the eigenvalues of 

, 0 0 

A=' 


-2 

1 

3 


are A = 2 and ,\ = ], so from Theorem 5.1.4 both \ — 2’ = 128 an d ,\ = ] 7 = 1 are eigenvalues of 
A 7 • We also showed that 


- 1 ' 


'O' 

0 

and 

1 

1 


0 


are eigenvectors of 4 corresponding to the eigenvalue ,\ = 2, so from Theorem 5.1.4 they are also 
eigenvectors of A 1 corresponding to \ = 2 7 = 128- Similarly, the eigenvector 


-2 

1 

1 

of A corresponding to the eigenvalue ^ ] is also an eigenvector of A y corresponding to 

A=l 7 = l- 


Eigenvalues and Invertibility 

The next theorem establishes a relationship between eigenvalues and the invertibility of a matrix. 


THEOREM 5.1.5 

A square matrix A is invertible if and only if A = 0 is not an eigenvalue of A. 


Assume that A is an n x n matrix and observe first that \ = Q is a solution of the characteristic 
equation 


A M + c\ A M * +... + c n = 0 

if and only if the constant term c n is zero. Thus, it suffices to prove that A is invertible if and only if c n * 0. 
But 

det(A/ — .d) = A M + ciA M + 

or, on setting A = 0? 

det( — A) = c n or ( — 1)” det (-d) =c n 










It follows from the last equation that det(24) = 0 if and only if c n = 0, and this in turn implies that A is 
invertible if and only if c n * 0. 


EXAMPLE 9 Eigenvalues and Invertibility 

The matrix A in Example 7 is invertible since it has eigenvalues A = 1 and A = 2- neither of which 
is zero. We leave it for you to check this conclusion by showing that det(zl) * 0. 


More on the Equivalence Theorem 

As our final result in this section, we will use Theorem 5.1.5 to add one additional part to Theorem 4.10.4. 


Equivalent Statements 

If A is an ^ x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is l n . 

(d) A is expressible as a product of elementary matrices. 

(e) = b is consistent for every ^ x 1 matrix b. 

(f) Ax = b has exactly one solution for every n x 1 matrix b. 

(g) det(zl) * 0. 

(h) The column vectors of are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span R n . 

(k) The row vectors of A span R n . 

(l) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n- 

(o) A has nullity 0- 

(p) The orthogonal complement of the null space of A is R n . 

(q) The orthogonal complement of the row space of A is {0} . 

(r) The range of Tj\ is R n . 

(s) T is one-to-one. 


(t) \ = 0 is not an eigenvalue of A. 


This theorem relates all of the major topics we have studied thus far. 


Concept Review 

Eigenvector 

Eigenvalue 

Characteristic equation 
Characteristic polynomial 
Eigenspace 
Equivalence Theorem 

Skills 

Find the eigenvalues of a matrix. 

Find bases for the eigenspaces of a matrix. 


Exercise Set 5.1 


In Exercises 1-2, confirm by multiplication that x is an eigenvector of A, and find the corresponding 
eigenvalue. 


1. 

i 

O 


V 

A = 

2 3 2 

; x = 

2 


1 0 4 


1 


Answer: 


5 


2. 

2 -i -r 


V 

A = 

-1 2 -1 

; x = 

1 


-1 -1 2 


1 


3. Find the characteristic equations of the following matrices: 


(a) 


(b) 


3 0 

8 -1 

10 -9 

4 -2 














(c) 

0 3 



A 0_ 


(d) 

'-2 

-7 


1 

2 

(e) 

'0 O' 



_0 0_ 


(f) 

"1 0" 



0 1 



Answer: 

(a) A 2 - 2A - 3 = 0 

(b) A 2 — 8A+ 16 = 0 

(c) A 2 - 12 = 0 

(d) A 2 + 3 = 0 

(e) A 2 = 0 

(f) A 2 — 2A 4- 1 = 0 

4. Find the eigenvalues of the matrices in Exercise 3 

5. Find bases for the eigenspaces of the matrices in Exercise 3 

Answer: 


(a) 


Basis for eigenspace corresponding to A = 3: 

A = -1 


(b) 


Basis for eigenspace corresponding to A = 4: 


; basis for eigenspace corresponding to 


(c) 


Basis for eigenspace corresponding to A=/l2: 

3 


3 

f)2 

1 


; basis for eigenspace corresponding to 


A = - {u 


\[\2 


(d) There are no eigenspaces. 
v ’ Basis for eigenspace corresponding to A = 

(f) 

v ’ Basis for eigenspace corresponding to A = 


0 : 

1: 


t 


"O' 

_0_ 

7 

_1_ 

T 


'O' 

_0_ 

7 

_1_ 



6. Find the characteristic equations of the following matrices: 


(a) 

4 0 1 

-2 1 0 

-2 0 1 


(b) 

3 

0 - 

■5' 



0 


L' 

1 - 

•2 

(c) 

r-2 

0 

1 


-6 -2 

0 


[19 

5 

-4 

(d) 

'-1 

0 

r 


-1 

3 

0 


-4 13 - 

-i 

(e) 

5 o r 
1 1 0 
-7 1 0 


(f) 

5 

6 

2" 


0 -1 

8 


1 

0 - 

2 


7. Find the eigenvalues of the matrices in Exercise 6. 

Answer: 


(a) 1,2,3 

(b) -{2,0, {2 

(c) “8 

(d) 2 

(e) 2 

( f ) ~ 4 ’ 3 


8. Find bases for the eigenspaces of the matrices in Exercise 6. 


9. Find the characteristic equations of the following matrices: 


(a) 


0 0 2 0 

10 10 

0 1-20 

0 0 0 1 


(b) 


10 -9 
4 -2 
0 0 
0 0 


0 

0 

-2 

1 




Answer: 


(a) A 4 + A 3 — 3A 2 — A -F 2 = 0 

(b) A 4 - 8A 3 I 19A 2 - 24A ) 48 = 0 

10. Find the eigenvalues of the matrices in Exercise 9. 

11. Find bases for the eigenspaces of the matrices in Exercise 9. 

Answer: 


(a) 


A = 1: basis 


2 


"o" 

3 


0 

1 

7 

0 

0 


1 


(b) 


A = 4: basis 


3 

2 

1 

0 

0 


2: basis 


-1 

0 

1 

0 


; A = 


1: basis 


-2 

1 

1 

0 


12. By inspection, find the eigenvalues of the following matrices: 


(a) 

(b) 



6 

5_ 

0 0 

7 0 

8 1 


(c) 


1 

3 

0 


0 

0 


0 0 0 

0 0 
0 1 0 
0 o 1 


13. Find the eigenvalues of fp for 


A = 


3 7 11 


0 0 4 

0 0 2 


Answer: 




14. Find the eigenvalues and bases for the eigenspaces of A 1 -’ for 


A = 


-1 -2 -2 

1 2 1 

-1 -1 0 


15. Let A be a 2 x 2 matrix, and call a line through the origin of g} invariant under A if Ax lies on the line 
when x does. Find equations for all lines in p}, if any, that are invariant under the given matrix. 


( a )^= 

"4 -1 
_2 1 


0 1 

-1 0 


"2 3' 

_° 2_ 

Answer: 

(a) y = x and y : 


(b) No lines 

(c) y = 0 

16. Find det(j4) given that A has p(X) as its characteristic polynomial. 

(a) ? (A)=A 3 -2A 2 + A + 5 

(b) ^(A) = A 4 —A 3 + 7 

[Hint: See the proof of Theorem 5.1.5.] 

17. Let A be an ^ x n matrix. 

(a) Prove that the characteristic polynomial of A has degree n. 

(b) Prove that the coefficient of A” in the characteristic polynomial is 1. 

'j 

18. Show that the characteristic equation of a 2 x 2 matrix A can be expressed as A — tr(y4) A + det(^4) = 0, 
where tr(j4) is the trace of A. 

19. Use the result in Exercise 18 to show that if 

a b 


A = 


c d 


then the solutions of the characteristic equation of A are 

A = ^[(a + d) ± {(^d)^+Ab^ 

Use this result to show that A has 

'y 

(a) two distinct real eigenvalues if (a — d) + 4bc > 0. 

(b) two repeated real eigenvalues if (a — d) z + Abe = 0. 

'y 

(c) complex conjugate eigenvalues if (a — d) + Abe < 0. 



20. Let A be the matrix in Exercise 19. Show that if b * 0> then 

-b 

a -\2 

are eigenvectors of A that correspond, respectively, to the eigenvalues 

Ai = +d) + 

and 

A 2 = ^(<at + £jf) - /o-d?) 2 + 4i>c 


xi = 


-b 

a — Aj 


and X2 = 


21. Use the result of Exercise 18 to prove that if /?(A) is the characteristic polynomial of a 2 x 2 matrix A, 
then p (A) = 0. 

22. Prove: If a, b, c, and d are integers such that a + b = c | d, then 

A-\ a b 
[c d_ 

has integer eigenvalues—namely, Aj = a + b and A 2 =a—c. 

23. Prove: If A is an eigenvalue of an invertible matrix A, and x is a corresponding eigenvector, then 1 / A is 
an eigenvalue of A ■> and x is a corresponding eigenvector. 


24. Prove: If A is an eigenvalue of A, x is a corresponding eigenvector, and s is a scalar, then ,\ — $ is an 
eigenvalue of _ $/, and x is a corresponding eigenvector. 


25. Prove: If ,\ is an eigenvalue of A and x is a corresponding eigenvector, then $A is an eigenvalue of sA for 
every scalar s, and x is a corresponding eigenvector. 


26. Find the eigenvalues and bases for the eigenspaces of 


A = 


-2 2 3 
-2 3 2 
-4 2 5 


and then use Exercises 23 and 24 to find the eigenvalues and bases for the eigenspaces of 

(a) A~ l 

(b) A-31 

(c) A I 21 


(a) Prove that if A is a square matrix, then A and A ^ have the same eigenvalues. [Hint: Look at the 
characteristic equationdet(A/ — A) = 0.] 

(b) Show that A and A T need not have the same eigenspaces. [Hint: Use the result in Exercise 20 to find 
a 2 x 2 matrix for which A and A T have different eigenspaces.] 


28. Suppose that the characteristic polynomial of some matrix A is found to be 

2 3 

p(X) = (A — 1) (A — 3) (A — 4) . In each part, answer the question and explain your reasoning. 

(a) What is the size of ^4? 

(b) Is A invertible? 

(c) How many eigenspaces does A have? 



29. The eigenvectors that we have been studying are sometimes called right eigenvectors to distinguish them 
from left eigenvectors, which are « x 1 column matrices x that satisfy the equation x J A = /jx 1 for some 

scalar fi. What is the relationship, if any, between the right eigenvectors and corresponding eigenvalues \ 
of A and the left eigenvectors and corresponding eigenvalues /./- of A1 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) If A is a square matrix and /he = Ax for some nonzero scalar then x is an eigenvector of A. 

Answer: 

False 

(b) If A is an eigenvalue of a matrix A, then the linear system (A/ — A)x = 0 has only the trivial solution. 
Answer: 

False 

'y 

(c) If the characteristic polynomial of a matrix A is ;?(A) = A + 1, then A is invertible. 

Answer: 

True 

(d) If A is an eigenvalue of a matrix A, then the eigenspace of A corresponding to A is the set of eigenvectors 
of A corresponding to \. 

Answer: 

False 

(e) If 0 is an eigenvalue of a matrix A, then A 1 is singular. 

Answer: 

True 

(I) The eigenvalues of a matrix A are the same as the eigenvalues of the reduced row echelon form of A. 
Answer: 

False 

(g) If 0 is an eigenvalue of a matrix A, then the set of columns of A is linearly independent. 

Answer: 


False 
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5.2 Diagonalization 

In this section we will be concerned with the problem of finding a basis for R n that consists of eigenvectors of an 
n x n matrix A. Such bases can be used to study geometric properties of A and to simplify various numerical 
computations. These bases are also of physical significance in a wide variety of applications, some of which will be 
considered later in this text. 


The Matrix Diagonalization Problem 

Our first objective in this section is to show that the following two seemingly different problems are equivalent. 

Given an n x n matrix A, does there exist an invertible matrix P such that P~^AP is diagonal? 
Given an n x n matrix A, does A have n linearly independent eigenvectors? 


L 


Similarity 

The matrix product P~^AP that appears in Problem 1 is called a similarity transformation of the matrix A. Such 
products are important in the study of eigenvectors and eigenvalues, so we will begin with some terminology about 
them. 


DEFINITION 1 

If A and B are square matrices, then we say that B is similar to A if there is an invertible matrix P such that 
B = P~ { AP- 


Note that if B is similar to A, then it is also true that A is similar to B , since we can express B as B = Q by 
taking Q = P . This being the case, we will usually say that A and B are similar matrices if either is similar to 
the other. 


Similarity Invariants 

Similar matrices have many properties in common. For example, if p = P~^AP> then it follows that A and B have 
the same determinant, since 


det(5) 


= det^P -1 APj = det(p~ { )det(^)det(P) 

= det(P) det ^) det ^) = det (-4) 

In general, any property that is shared by all similar matrices is called a similarity invariant or is said to be 
invariant under similarity. Table 1 lists the most important similarity invariants. The proofs of some of these 
results are given as exercises. 


Similarity Invariants 


Property 

Description 

Determinant 

A and P~^AP have the same determinant. 

Invertibility 

A is invertible if and only if P~^AP is invertible. 

Rank 

A and P~^AP have the same rank. 

Nullity 

A and P~^AP have the same nullity. 

Trace 

A and P~^AP have the same trace. 

Characteristic 

polynomial 

A and P~^AP have the same characteristic polynomial. 

Eigenvalues 

A and P~^AP have the same eigenvalues. 

Eigenspace 

dimension 

If A is an eigenvalue of A and hence of p ~^AP> then the eigenspace of A 
corresponding to A and the eigenspace of P~^AP corresponding to A have the same 
dimension. 


Expressed in the language of similarity, Problem 1 posed above is equivalent to asking whether the matrix A is 
similar to a diagonal matrix. If so, the diagonal matrix will have all of the similarity-invariant properties of A, but 
will have a simpler form, making it easier to analyze and work with. This important idea has some associated 
terminology. 


DEFINITION 2 

A square matrix A is said to be diagonalizable if it is similar to some diagonal matrix; that is, if there exists 
an invertible matrix P such that p ~^AP is diagonal. In this case the matrix P is said to diagonalize A. 


J 


The following theorem shows that Problems 1 and 2 posed above are actually two different forms of the same 
mathematical problem. 


THEOREM 5.2.1 


If A is an ^ x n matrix, the following statements are equivalent. 





(a) A is diagonalizable. 

(b) A has n linearly independent eigenvectors. 


Part ( b ) of Theorem 5.2.1 is equivalent to saying 
that there is a basis for R n consisting of 
eigenvectors of A. Why? 


Proof (a)=>(b) Since A is assumed to be diagonalizable, it follows that there exists an invertible matrix P and a 
diagonal matrix D such that P -1 AP = D or, equivalently, 


AP = PD (1) 

If we denote the column vectors of P by p \ , p2, - -p M , and if we assume that the diagonal entries of D are 
Ai, A 2 ,Athen by Formula 6 of Section 1.3 the left side of 1 can be expressed as 

AP = A[ pi p 2 ... p«] = [Ap { Am — Ap„] 

and, as noted in the comment following Example 1 of Section 1.7, the right side of 1 can be expressed as 

PD= [Aipi A2P2 ... A„p„] 

Thus, it follows from 1 that 


Ap\ = Aipi, Ap 2 =A 2 P2,..., Ap„ = A w p„ (2) 

Since P is invertible, we know from Theorem 5.1.6 that its column vectors pi, P2, -Vn are linearly independent 
(and hence nonzero). Thus, it follows from 2 that these n column vectors are eigenvectors of A. 

Assume that A has n linearly independent eigenvectors, Pi, P2, Pn> an d that Aj, A 2 ,A„ are 
the corresponding eigenvalues. If we let 


^=[P1 P2 — Pm] 

and if we let D be the diagonal matrix that has X\, A 2 ,A M as its successive diagonal entries, then 

AP = A [pi P2 ... P„] = [Api Ap 2 ... Ap„] 

= [Aipi A 2 P2 — A„p„] =PD 

Since the column vectors of P are linearly independent, it follows from Theorem 5.1.6 that P is invertible, so that 
this last equation can be rewritten as P~^AP = which shows that A is diagonalizable. 


Procedure for Diagonalizing a Matrix 

The preceding theorem guarantees that an n x n matrix A with n linearly independent eigenvectors is 
diagonalizable, and the proof suggests the following method for diagonalizing A. 


Procedure for Diagonalizing a Matrix 


Step 1. Confirm that the matrix is actually diagonalizable by finding n linearly independent eigenvectors. 
One way to do this is by finding a basis for each eigenspace and merging these basis vectors into a single 
set S. If this set has fewer than n vectors, then the matrix is not diagonalizable. 

Step 2. Form the matrix P = [ pi P2 --- Pm] that has the vectors in S as its column vectors. 

Step 3. The matrix P~^AP will he diagonal and have the eigenvalues X\, A 2 , A n corresponding to the 
eigenvectors pj, P 2 ,p M as its successive diagonal entries. 


J 


EXAMPLE 1 Finding a Matrix P That Diagonalizes a Matrix A 


Find a matrix P that diagonalizes 



0 -2 
2 1 
0 3 


In Example 7 of the preceding section we found the characteristic equation of A to be 

(A- 1)(A — 2) 2 = 0 


and we found the following bases for the eigenspaces: 


II 

PU 

CM 

II 

<< 

"-1" 

0 

, P2 = 

0" 

1 

; A= 1: p 3 = 

-2" 

1 


1 


0 


1 


There are three basis vectors in total, so the matrix 

-1 
0 
1 


P = 


diagonalizes A. Asa check, you should verify that 


P~ l AP = 


-2 

1 

1 


-1 

o 

ro 

0 0-2' 

'-1 0 -2" 


'2 0 0" 

i i i 

1 2 1 

0 1 1 

= 

0 2 0 

-1 0 -1 

21 0 3 

1 0 1 


-1 

o 

o 


In general, there is no preferred order for the columns of P. Since the z'th diagonal entry of p 1 AP is an eigenvalue 
for the /tli column vector of P, changing the order of the columns of P just changes the order of the eigenvalues on 
the diagonal of P~^AP- Thus, had we written 

r-i -2 o" 

0 1 1 
1 1 0 


p = 






















the preceding example, we would have obtained 


P~ l AP = 


2 0 0 
0 1 0 
0 0 2 


EXAMPLE 2 A Matrix That Is Not Diagonalizable 

Find a matrix P that diagonalizes 

A = 


1 

1 

-3 


0 0 
2 0 
5 2 


The characteristic polynomial of A is 

A— 1 0 

det(A/ — A) = 


-1 A — 2 0 

3 -5 A — 2 


= (A-1)(A —2)‘ 


so the characteristic equation is 

(A — 1) (A — 2) 2 = 0 

Thus, the distinct eigenvalues of A are = 1 and A = 2- We leave it for you to show that bases for 
the eigenspaces are 

1 

8 

1 


A=l: pi = 


; A = 2: p 2 = 


Since A is a 3 x 3 matrix and there are only two basis vectors in total, A is not diagonalizable. 

If you are concerned only in determining whether a matrix is 
diagonalizable and not with actually finding a diagonalizing matrix P, then it is not necessary to 
compute bases for the eigenspaces—it suffices to find the dimensions of the eigenspaces. For this 
example, the eigenspace corresponding to \ = ] is the solution space of the system 


0 

0 

0" 

"*f 


0" 

-1 

-1 

0 

x 2 

= 

0 

3 

-5 

-1 

*3 


0 


Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theorem 4.8.2, and 
hence the eigenspace corresponding to ,\ = ] is one-dimensional. 

The eigenspace corresponding to \ = 2 is the solution space of the system 


-1 

o 

o 

‘*f 


'O' 

-1 0 0 

x 2 

= 

0 

3-5 0 

x 3 


0 


This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corresponding to 
\ = 2 is also one-dimensional. Since the eigenspaces produce a total of two basis vectors, and since 
three are needed, the matrix A is not diagonalizable. 
























There is an assumption in Example 1 that the column vectors of P, which are made up of basis vectors from the 
various eigenspaces of A, are linearly independent. The following theorem, proved at the end of this section, shows 
that this is so. 


THEOREM 5.2.2 

If vi, V 2 ,Vfc are eigenvectors of a matrix A corresponding to distinct eigenvalues, then 
{ v f v 2> - - -> } is a linearly independent set. 


Theorem 5.2.2 is a special case of a more general result: Suppose that Aj, A 2 ,Aft are distinct 
eigenvalues and that we choose a linearly independent set in each of the corresponding eigenspaces. If we then 
merge all these vectors into a single set, the result will still be a linearly independent set. For example, if we choose 
three linearly independent vectors from one eigenspace and two linearly independent vectors from another 
eigenspace, then the five vectors together form a linearly independent set. We omit the proof. 


As a consequence of Theorem 5.2.2, we obtain the following important result. 


THEOREM 5.2.3 

If an n x n matrix A has n distinct eigenvalues, then A is diagonalizable. 


If vi, V 2 ,v„ are eigenvectors corresponding to the distinct eigenvalues Ai, A 2 ,A M , then by Theorem 
5.2.2, vi, V 2 ,v w are linearly independent. Thus, A is diagonalizable by Theorem 5.2.1. 


EXAMPLE 3 Using Theorem 5.2.3 


We saw in Example 3 of the preceding section that 


A = 


0 1 
0 0 
4 -17 


0 

1 

8 


has three distinct eigenvalues: \ = 4, A = 2 4* ^3, and A = 2 — Therefore, A is diagonalizable 
and 


P~ l AP = 


4 

0 

0 


0 

2 + ^3 
0 



for some invertible matrix P. If needed, the matrix P can be found using the method shown in 
Example 1 of this section. 






EXAMPLE 4 Diagonalizability of Triangular Matrices 


From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main diagonal. 
Thus, a triangular matrix with distinct entries on the main diagonal is diagonalizable. For example, 

"-1 2 4 0 " 

0 3 1 7 

0 0 5 8 

0 0 0 -2 


is a diagonalizable matrix with eigenvalues X\= — 1 , A 2 = 3, A 3 = 5, A 4 = — 2. 


Computing Powers of a Matrix 

There are many applications in which it is necessary to compute high powers of a square matrix A. We will show 
next that if A happens to be diagonalizable, then the computations can be simplified by diagonalizing A. 


To start, suppose that A is a diagonalizable ^ x n matrix, that P diagonalizes A, and that 

"Ai 0 ... 0 

P~ l AP = 


0 A 2 ... 0 
0 0 ... A„ 


= D 


Squaring both sides of this equation yields 


( f 


-l A p2 


) 


Aj 0 

0 a! 

0 0 



We can rewrite the left side of this equation as 

2 

_ 1 4P) =P ~ X APP ~ { AP = P ~ { AIAP = P~ { A 2 P 

from which we obtain the relationship P~^A^P = D^- More generally, if k is a positive integer, then a similar 
computation will show that 


P~ X A k P = D k 


Af 0 

0 Aj* 


0 

0 


0 0 ... A* 


which we can rewrite as 










A k = PD k P~ 1 =P 


Af 0 

0 Aj* 


0 0 


0 

0 


4 


>-l 


( 3 ) 


Formula 3 reveals that raising a diagonalizable 
matrix A to a positive integer power has the effect 
of raising its eigenvalues to that power. 


Note that computing the right side of this formula involves only three matrix multiplications and the powers of the 
diagonal entries of D. For matrices of large size and high powers of \, this involves substantially fewer operations 
than computing A k directly. 

EXAMPLE 5 Power of a Matrix 


Use 3 to find A 13 , where 


A = 


0 0 -2 
1 2 1 
1 0 3 


We showed in Example 1 that the matrix A is diagonalized by 

-1 0 - 2 " 


and that 


P = 


0 

1 


D = P~ l AP = 


Thus, it follows from 3 that 


a 13 =pd 13 p~ 1 


1 

1 

0 

2 

0 



'-1 0 

- 2 ' 

2 13 0 

0 

1 0 

2 ' 

= 

0 1 

1 

0 2 13 

0 

1 1 

1 


1 0 

1 

1 - 

O 

o 

l 13 . 

-1 0 

-1 


'-8190 

0 

- 16382 ' 




= 

8191 

8192 8191 





8191 

0 

16383 





( 4 ) 


With the method in the preceding example, most of the work is in diagonalizing A. Once that work is 
done, it can be used to compute any power of A. Thus, to compute A ^00 we nee d only change the exponents from 
13 to 1000 in 4. 


















Eigenvalues of Powers of a Matrix 


Once the eigenvalues and eigenvectors of any square matrix A are found, it is a simple matter to find the 
eigenvalues and eigenvectors of any positive integer power of A. For example, if A is an eigenvalue of A and x is a 
corresponding eigenvector, then 

A 2 x = A (Ax) = A (Ax) = A(Ax) = A (Ax) = A 2 x 

which shows not only that A 2 is an eigenvalue of A 1 but that x is a corresponding eigenvector. In general, we have 
the following result. 

Note that diagonalizability is not a requirement in 
Theorem 5.2.4. 


THEOREM 5.2.4 

If A is an eigenvalue of a square matrix A and x is a corresponding eigenvector, and if k is any positive 
integer, then A* is an eigenvalue of A k and x is a corresponding eigenvector. 


Some problems that use this theorem are given in the exercises. 


Geometric and Algebraic Multiplicity 

Theorem 5.2.3 does not completely settle the diagonalizability question since it only guarantees that a square 
matrix with n distinct eigenvalues is diagonalizable, but does not preclude the possibility that there may exist 
diagonalizable matrices with fewer than n distinct eigenvalues. The following example shows that this is indeed the 
case. 


EXAMPLE 6 The Converse of Theorem 5.2.3 Is False 

Consider the matrices 



1 

o 

o 


-1 

o 

/= 

0 1 0 

and J = 

0 1 1 


o 

o 


- 1 

o 

o 


It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigenvalue, namely 
A = 1, and hence only one eigenspace. We leave it as an exercise for you to solve the characteristic 
equations 

(A/-/)x = 0 and (\J-l)x = 0 

with A = 1 and show that for / the eigenspace is three-dimensional (all of and for Jit is 
one-dimensional, consisting of all scalar multiples of 






X = 


1 
0 
0 

This shows that the converse of Theorem 5.2.3 is false, since we have produced two 3x3 matrices 
with fewer than three distinct eigenvalues, one of which is diagonalizable and the other of which is 
not. 


A full excursion into the study of diagonalizability is left for more advanced courses, but we will touch on one 
theorem that is important to a fuller understanding of diagonalizability. It can be proved that if Aq is an eigenvalue 
of A, then the dimension of the eigenspace corresponding to Aq cannot exceed the number of times that A — Aq 
appears as a factor of the characteristic polynomial of A. For example, in Example 1 and Example 2 the 
characteristic polynomial is 

(A-l)(A-2) 2 

Thus, the eigenspace corresponding to A = 1 is at most (hence exactly) one-dimensional, and the eigenspace 
corresponding to A = 2 is at most two-dimensional. In Example 1 the eigenspace corresponding to A = 2 actually 
had dimension 2, resulting in diagonalizability, but in Example 2 the eigenspace corresponding to A = 2 had only 
dimension 1, resulting in nondiagonalizability. 

There is some terminology that is related to these ideas. If Aq is an eigenvalue of an ^ x n matrix A, then the 
dimension of the eigenspace corresponding to Aq is called the geometric multiplicity of Aq, and the number of 
times that A — Aq appears as a factor in the characteristic polynomial of A is called the algebraic multiplicity of Aq. 
The following theorem, which we state without proof, summarizes the preceding discussion. 


Geometric and Algebraic Multiplicity 

If A is a square matrix, then: 

(a) For every eigenvalue of A, the geometric multiplicity is less than or equal to the algebraic multiplicity. 

(b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is equal to the 
algebraic multiplicity. 


OPTIONAL 

We will complete this section with an optional proof of Theorem 5.2.2. 

Let vi, V 2 ,Vft be eigenvectors of,4 corresponding to distinct eigenvalues 
Ai, A2,Aft. We will assume that vi, V2,Vft are linearly dependent and obtain a contradiction. We can then 
conclude that vi, V2,Vft are linearly independent. 

Since an eigenvector is nonzero by definition, {vi} is linearly independent. Let r be the largest integer such that 
(vi, V2, v r ) is linearly independent. Since we are assuming that {v\, V2,Vft} is linearly dependent, r 
satisfies 1 < r < k . Moreover, by the definition of r, (vi, V2,v r _|_i ) is linearly dependent. Thus, there are 
scalars ci, c 2 , c r . |_i, not all zero, such that 




cjvi + C2V2 + ... + c r+i v r+ i = 0 


(5) 


Multiplying both sides of 5 by A and using the fact that 

Ax\ = Ajvi, Av2 = A2V2,..., Av r+ i =A,. + iv r+ i 

we obtain 


ciAivi * c 2 A 2 v 2 + ... + c, + iA, + iv,. + i = 0 

If we now multiply both sides of 5 by and subtract the resulting equation from 6 we obtain 
ci(Ai — A r+ i)vj +c 2 (A 2 —A r+ i)v 2 +...+ c ) .(A r —A r+ i)v, = 0 
Since {v\, V 2 ,v r } is a linearly independent set, this equation implies that 

ci(Ai — A,. + i) =C2(A2 -Am_i) = ... = Cr(A,-A r+ i) = 0 
and since Ai, A 2 ,A r _|_i are assumed to be distinct, it follows that 

Cl =c 2 = ... = c r = 0 


Substituting these values in 5 yields 


Cr+iv,-+l 

Since the eigenvector is nonzero, it follows that 


0 


c r+ l = 0 


But equations 7 and 8 contradict the fact that c\, c% c r +\ are not all zero so the proof is complete. 


( 6 ) 


(7) 


( 8 ) 


Concept Review 

Similarity transformation 
Similarity invariant 
Similar matrices 
Diagonalizable matrix 
Geometric multiplicity 
Algebraic multiplicity 

Skills 

Determine whether a square matrix A is diagonalizable. 

Diagonalize a square matrix ,4. 

Find powers of a matrix using similarity. 

Find the geometric multiplicity and the algebraic multiplicity of an eigenvalue. 


Exercise Set 5.2 


In Exercises 1—4, show that A and B are not similar matrices. 

1 . 


’A = 


1 1 

3 2 


,B = 


1 0 

3 -2 


Answer: 

Possible reason: Determinants are different. 


2 . 


A = 


4 -1 
2 4 


B = 


4 1 
2 4 


3. 

'1 2 3' 


12 0 
1,0 

A = 

0 1 2 

0 0 1 

,B = 




O 

o 


Answer: 

Possible reason: Ranks are different. 


4. 

'1 0 f 


'1 1 0" 

A = 

2 0 2 

3 0 3 

,B = 

2 2 0 

0 1 1 


9 o 

5. Let A be a $ x 6 matrix with characteristic equation A (A — 1) (A — 2) = 0. What are the possible dimensions 
for eigenspaces of A? 

Answer: 

A = 0:1 or 2; A= 1:1; A = 2:1, 2, or 3 

6 . Let 


A = 


4 0 1 
2 3 2 
1 0 4 


(a) Find the eigenvalues of A. 

(b) For each eigenvalue A, find the rank of the matrix A/ — A- 

(c) Is A diagonalizable? Justify your conclusion. 

In Exercises 7-11, use the method of Exercise 6 to determine whether the matrix is diagonalizable. 

7 . 2 0 
1 2 


Answer: 


Not diagonalizable 






















8 . 


2 

1 


-3 

-1 


0 2 0 
0 1 2 


Answer: 


Not diagonalizable 


10 . 


-1 0 1 

-13 0 

-4 13 -1 


11 . 


2-10 1 
0 2 1-1 

0 0 3 2 

0 0 0 3 


Answer: 


Not diagonalizable 

In Exercises 12-15, find a matrix P that diagonalizes A, and compute P~^AP- 


12 . 


A = 


13. 


A = 


-14 12 
-20 17 

1 O' 

6 -1 


Answer: 


P = 


14. 


A = 


i 0 
1 1 




■ P~ l AP = 

1 0 
.o -i_ 


1 0 0 
0 1 1 
0 1 1 


15. 2 0 -2 

A= 0 3 0 

0 0 3 


Answer: 


-2 

0 

f 

II 

$! 

7 

cl, 

'3 

0 

O' 

0 

1 

0 

0 

3 

0 

1 

0 

0 


0 

0 

2 


In Exercises 16-21, find the geometric and algebraic multiplicity of each eigenvalue of the matrix A, and 
determine whether A is diagonalizable. If A is diagonalizable, then find a matrix P that diagonalizes A, and find 
P~ l AP- 



16. [19 -9 -6' 

A= 25 -11 -9 
17 -9 -4 

-1 4 -2' 

-3 4 0 

-3 1 3_ 

Answer: 

"1 21] [10 O' 

P= 1 3 3; P~ l AP = 020 

1 3 4j |_0 0 3 

18. [5 0 0" 

A= 15 0 

0 1 5 

19. [0 0 O' 

A= 0 0 0 

3 0 1 

Answer: 

10 0] [0 0 O' 

0 10; P~ l AP= 000 
3 0 lj |_0 0 1 

2 0 0 O' 

0-200 
0 0 3 0 

0 0 13 

2 0 0 0 " 

0-2 5-5 
0 0 3 0 

0 0 0 3 

Answer: 




"1 0 0 0 ] [-2 0 0 0 " 

011-1. 1 0-200 

0 0 10’ 0 0 3 0 

0 0 0 lj |_0003 

22. Use the method of Example 5 to compute where 



23. Use the method of Example 5 to compute A 11 , where 



Answer: 


A = 


-1 7 -1 

0 1 0 
0 15 -2 


-1 10237 -2047 
0 1 0 
0 10245 -2048 

24. In each part, compute the stated power of 


A = 


1 -2 8 
0-1 0 
0 0-1 


(a) A 

25. Find A n if n is a positive integer and 


1000 


(b) ^ _100 ° (c) A 2m 


A = 


3 -1 0 

-1 2 -1 
0-1 3 


Answer: 


A n = pD Yi p -\ = 


1 

-1 

1 


r 

o 

o 


o 

3” 

0 


0 

0 

4” 


1 

3 

4 0-4 

I 

'3 


26. Let 




a b 
c d 


Show that 


(d) A 


-2301 


(a) A is diagonalizable if (a — d) + 4 be > 0. 

(b) A is not diagonalizable if (a — d)^ + Abe < 0. 

[Hint: See Exercise 19 of Section 5.1.] 

27. In the case where the matrix A in Exercise 26 is diagonalizable, find a matrix P that diagonalizes A. [Hint: See 
Exercise 20 of Section 5.1.] 

Answer: 

-b 


On possibility is P = 


-6 
Ai a- \2 


where X\ and A 2 are as in Exercise 20 of Section 5.1. 


28. Prove that similar matrices have the same rank. 

29. Prove that similar matrices have the same nullity. 



30. Prove that similar matrices have the same trace. 

31. Prove that if A is diagonalizable, then so is A* for every positive integer k. 

32. Prove that if A is a diagonalizable matrix, then the rank of A is the number of nonzero eigenvalues of A. 

33. Suppose that the characteristic polynomial of some matrix A is found to be p( A) = (A — 1) (A — 3) (A — 4) . 
In each part, answer the question and explain your reasoning. 

(a) What can you say about the dimensions of the eigenspaces of A? 

(b) What can you say about the dimensions of the eigenspaces if you know that A is diagonalizable? 

(c) If {vi, V 2 , V 3 } is a linearly independent set of eigenvectors of A all of which correspond to the same 
eigenvalue of A , what can you say about the eigenvalue? 


Answer: 


(a) 

(b) 

(c) 


A = 1 : dimension =1, A = 3: dimension <2, A = 4 : dimension <3 

Dimensions will be exactly 1, 2, and 3. 

A = 4 


34. This problem will lead you through a proof of the fact that the algebraic multiplicity of an eigenvalue of an 
nxn matrix A is greater than or equal to the geometric multiplicity. For this purpose, assume that Aq is an 
eigenvalue with geometric multiplicity k. 

(a) Prove that there is a basis B— {uj, U 2 ,..u M } for R n in which the first k vectors of B form a basis for the 
eigenspace corresponding to Aq. 


(b) Let P be the matrix having the vectors in B as columns. Prove that the product AP can be expressed as 


AP = P 


o 


x 

Y 


[Hint: Compare the first k column vectors on both sides.] 


(c) Use the result in part (b) to prove that A is similar to 


C = 


'Vfc 

o 


X 

Y 


and hence that A and C have the same characteristic polynomial. 


(d) By considering det(A I — C), prove that the characteristic polynomial of C (and hence A) contains the 

factor (A — Aq) at least k times, thereby proving that the algebraic multiplicity of Aq is greater than or equal 
to the geometric multiplicity k. 


True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) Every square matrix is similar to itself. 

Answer: 

True 

(b) If A, B, and C are matrices for which A is similar to B and B is similar to C, then A is similar to C. 






Answer: 


True 

(c) If A and B are similar invertible matrices, then ^4 -1 and g are similar. 

Answer: 

True 

(d) If A is diagonalizable, then there is a unique matrix P such that p ~^AP is diagonal. 

Answer: 

False 

(e) If,4 is diagonalizable and invertible, then A~^ is diagonalizable. 

Answer: 

True 

(f) If A is diagonalizable, then jp is diagonalizable. 

Answer: 

True 

(g) If there is a basis for R n consisting of eigenvectors of an ^ x n matrix A, then A is diagonalizable. 
Answer: 

True 

(h) If every eigenvalue of a matrix ,4 has algebraic multiplicity 1, then ,4 is diagonalizable. 

Answer: 

True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



5.3 Complex Vector Spaces 

Because the characteristic equation of any square matrix can have complex solutions, the notions of complex eigenvalues and 
eigenvectors arise naturally, even within the context of matrices with real entries. In this section we will discuss this idea and 
apply our results to study symmetric matrices in more detail. A review of the essentials of complex numbers appears in the 
back of this text. 


Review of Complex Numbers 


Recall that if z = a + bi is a complex number, then: 

Re(z) = a and Im(z) = b are called the real part of z and the imaginary part of z, respectively, 

|z| = /a 2 4- 6 2 is called the modulus (or absolute value) ofz, 

z = a — bi is called the complex conjugate of z, 

• z z = a 2 =hb 2 = |z| 2 , 

the angle $ in Figure 5.3.1 is called an argument of z, 

• Re(z) = |z| cos 

• Im(z) = |z| sin $ 

z= |z|(cos$+isin<$) is called the polar form of z. 



Complex Eigenvalues 


In Formula 3 of Section 5.1 we observed that the characteristic equation of a general nxn matrix A has the form 


( 1 ) 


in which the highest power of A has a coefficient of 1. Up to now we have limited our discussion to matrices in which the 
solutions of 1 are real numbers. However, it is possible for the characteristic equation of a matrix^ with real entries to have 
imaginary solutions; for example, the characteristic equation of the matrix 



is 

= A 2 + 1 = 0 

which has the imaginary solutions \ = j and \= — j. To deal with this case we will need to explore the notion of a complex 
vector space and some related ideas. 


A + 2 1 

-5 A — 2 











Vectors in C n 

A vector space in which scalars are allowed to be complex numbers is called a complex vector space. In this section we will 
be concerned only with the following complex generalization of the real vector space R n . 


DEFINITION 1 

If n is a positive integer, then a complex n-tuple is a sequence of n complex numbers (vj, V 2 , v M ). The set of all 
complex ^-tuples is called complex n-space and is denoted by C”. Scalars are complex numbers, and the operations 
of addition, subtraction, and scalar multiplication are performed componentwise. 


J 


The terminology used for /i-tuples of real numbers applies to complex n-tuples without change. Thus, if v\, V 2 ,v M are 
complex numbers, then we call v = (y\, V 2 ,v M ) a vector in C n and vj, V 2 , v n its components. Some examples of 
vectors in C 3 are 

u= (1 +i, — 4z, 3 4- 20, v= (0, i, 5), w= ^6 - fei, 9 4= ^i, raj 

Every vector 

v= (vi, V 2 ,..., v„) = (a\ + b\i, a2^b2i,...,a yi l~by l i) 
in C” can be split into real and imaginary parts as 

v=(aua 2 ,...,a ri ) + i(Ai, b 2 ,b„) 

which we also denote as 

v = Re(v) Im(v) 

where 

Re(v) = (a\, a 2 , a n ) and Im(v) = (Ai, £ 2 * b n) 

The vector 

V= (V1.V2,..., v„) = («1 -b\i,a2-b2i,...,a n -b n i) 
is called the complex conjugate of v and can be expressed in terms of Re(v) and Im(v) as 


v= (a\, <32, CLyi) — i(Ai, A 2 ,A„) =Re(v) —i Im(v) 


( 2 ) 


It follows that the vectors in R n can be viewed as those vectors in C” whose imaginary part is zero; or stated another way, a 
vector v in C” is in R n if and only if v = v. 

In this section we will also need to consider matrices with complex entries, so henceforth we will call a matrix A a real matrix 
if its entries are required to be real numbers and a complex matrix if its entries are allowed to be complex numbers. The 
standard operations on real matrices carry over to complex matrices without change, and all of the familiar properties of 
matrices continue to hold. 

If A is a complex matrix, then Re(4) and Im(4) are the matrices formed from the real and imaginary parts of the entries of A , 
and A is the matrix formed by taking the complex conjugate of each entry in A. 

EXAMPLE 1 Real and Imaginary Parts of Vectors and Matrices 

Let 

1 d-i —1 
4 6-2 i 


v = (3 + i, — 2 i, 5) and A = 




Then 


v = (3 -i, 2 i, 5), Re(v) = (3, 0, 5), Im(v) = (1, - 2, 0) 


A = 


det (A) = 


1 — i i 
4 6 + 2i 

1+i -i 
4 6 — 2 i 


Re (A) = 


1 0 
4 6 


Im(.i4) = 


1 -1 

0 -2 


= (l + ,)(6 —2i)-(-j)(4) = 8 + 8i 


Algebraic Properties of the Complex Conjugate 

The next two theorems list some properties of complex vectors and matrices that we will need in this section. Some of the 
proofs are given as exercises. 


THEOREM 5.3.1 

If u and v are vectors in C”, and if k is a scalar, then: 

(a) 5 = u 

(b) ku = fcu. 

( c ) u + v = u-f v 

(d) u^v = n-v 


THEOREM 5.3.2 

If + is an m x k complex matrix and B is a £ x n complex matrix, then: 

(a) A = A 

(b) ^ r j= (A) 7 

(c) AB = A B 


The Complex Euclidean Inner Product 

The following definition extends the notions of dot product and norm to C M . 


DEFINITION 2 

If u = (ti\, u n ) and v = (vj, V 2 , - v„) are vectors in C”, then the complex Euclidean inner product of of u 

and v (also called the complex dot product ) is denoted by u • v an d is defined as 












U • v = «1V1 + U2 v 2 + ••• + u n v n 


( 3 ) 


We also define the Euclidean norm on C” to be 

||v|| = j/v- v= /|vi| 2 + |v 2 | 2 +-+ |v m | 2 (4) 

L J 

As in the real case, we call v a unit vector in C n if ||v|| = 1, and we say two vectors u and v are orthogonal if u ■ v = 0- 

The complex conjugates in 3 ensure that ||v|| is a real 
number, for without them the quantity v • v in 4 might 
be imaginary. 


EXAMPLE 2 Complex Euclidean Inner Product and Norm 

Find u * v? v * u? ||u||, and ||v|| for the vectors 

u= (1 -Hi, i, 3 — i) and v=(l+i, 2, Ai) 


Solution 

u-v=(l+0(T+T) + i(2)+(3-0(4i) = (l+0(l-0 + 2i + (3-0(-4i)= — 2 — lCtt 
v • u= (1+0(1+7)+ 2(7) + (4i) (3^7) = (1 +i)(l -0“2i + 4i(3 + 0= — 2 + lOi 

l|u|| = j/|l +i| 2 + |i| 2 + |3 —i| 2 = ^2 + 1 + 10 = 

INI = j/11 +1| 2 + |2| 2 + |4i| 2 = |/2 + 4+16 = /22 


Recall from Table 1 of Section 3.2 that if u and v are column vectors in R n , then their dot product can be expressed as 
The analogous formulas in C” are (verify) 


T T 

U • V = U V = V u 


T- -T 

u■V = u v = v u 


(5) 


Example 2 reveals a major difference between the dot product on R n and the complex dot product on C”. For the dot product 
on R n we always have v ■ u = u • v (the symmetry property ), but for the complex dot product the corresponding relationship is 
given by u • v = v • u, which is called its antisymmetry property. The following theorem is an analog of Theorem 3.2.2. 


THEOREM 5.3.3 

If u, v, and w are vectors in C”, and if k is a scalar, then the complex Euclidean inner product has the following 
properties: 

f a j u • v = v • u [Antisymmetry property] 

(b) u ■ (v + w) = u ■ v + u • w [Distributive property] 

( c ) t(u ■ v) = (hi) • v [Homogeneity property] 













(d) u ■ kv = k(u • v) 


[Antihomogeneity property] 
v = 0 [ Po sitivity prop erty ] 


(e) v • v > 0 and v • v = 0 if and only if v = 0 


Parts (c) and ( d) of this theorem state that a scalar multiplying a complex Euclidean inner product can be regrouped with the 
first vector, but to regroup it with the second vector you must first take its complex conjugate. We will prove part ( d ), and 
leave the others as exercises. 


Proof (d) 


£(u • v) = k(v • uj = £(v • u) = k(v • u) = [kv'j • u = u ■ 
To complete the proof, substitute k for k and use the fact that k = k- 


Vector Concepts in C n 


Except for the use of complex scalars, the notions of linear combination, linear independence, subspace, spanning, basis, and 
dimension carry over without change to C M . 

Is/?” a subspace ofC”? Explain. 

Eigenvalues and eigenvectors are defined for complex matrices exactly as for real matrices. If A is an n x n matrix with 
complex entries, then the complex roots of the characteristic equation det(A l — ^4) = 0 are called complex eigenvalues of A. 
As in the real case, A is a complex eigenvalue of A if and only if there exists a nonzero vector x in C n such that Ax = Ax- 
Each such x is called a complex eigenvector of A corresponding to X. The complex eigenvectors of A corresponding to X are 
the nonzero solutions of the linear system (A 1 — ^4)x = 0, and the set of all such solutions is a subspace of C”, called the 
eigenspace of A corresponding to X. 

The following theorem states that if a real matrix has complex eigenvalues, then those eigenvalues and their corresponding 
eigenvectors occur in conjugate pairs. 

THEOREM 5.3.4 

If X is an eigenvalue of a real nxn matrix A, and if x is a corresponding eigenvector, then A is also an eigenvalue of A, 
and x is a corresponding eigenvector. 

Since X is an eigenvalue of A and x is a corresponding eigenvector, we have 


Ax = Ax = Ax 


( 6 ) 


However, A = A, since A has real entries, so it follows from part (c) of Theorem 5.3.2 that 


Ax = Ax = Ax 


( 7 ) 


Equations 6 and 7 together imply that 


Ajc = Ax = Ax 






which x ^ 0 (why?); this tells us that A is an eigenvalue of A and x is a corresponding eigenvector. 


EXAMPLE 3 Complex Eigenvalues and Eigenvectors 

Find the eigenvalues and bases for the eigenspaces of 


A = 


-2 -1 

5 2 


The characteristic polynomial of A is 
A+2 1 

-5 A-2 


= A i + l = (A-z)(A + z) 


so the eigenvalues of A are A = i and \ = — j. Note that these eigenvalues are complex conjugates, as 
guaranteed by Theorem 5.3.4. 



■*l“ 


"o' 


/ 2 _ 


_o_ 


To find the eigenvectors we must solve the system 

'A + 2 1 

-5 A-2 

with \ = j and then with \ = _ j. With A = i, this system becomes 

i + 2 1 

_ -5 i-2 

We could solve this system by reducing the augmented matrix 

i + 2 1 0 

-5 z-20 



"*r 


"o' 


/ 2 _ 


_o_ 


( 8 ) 


(9) 


to reduced row echelon form by Gauss-Jordan elimination, though the complex arithmetic is somewhat tedious. 
A simpler procedure here is first to observe that the reduced row echelon form of 9 must have a row of zeros 
because 8 has nontrivial solutions. This being the case, each row of 9 must be a scalar multiple of the other, and 
hence the first row can be made into a row of zeros by adding a suitable multiple of the second row to it. 
Accordingly, we can simply set the entries in the first row to zero, then interchange the rows, and then multiply 
the new first row by —-i to obtain the reduced row echelon form 

> H* 


o 


0 0 


Thus, a general solution of the system is 


*1 = (” 5 + 5 *)*’ X2 = t 


This tells us that the eigenspace corresponding to A = i is one-dimensional and consists of all complex scalar 
multiples of the basis vector 


x = 


_l + li 

5 5 

1 


( 10 ) 


As a check, let us confirm that /Jx = ix- We obtain 


Ax = 


-2 -1 

5 2 


-l + li 
5^5 

1 


5 (-§ + i ,) +2 


-1-li 
5 5 

i 


= zx 


We could find a basis for the eigenspace corresponding to A = — z in a similar way, but the work is unnecessary, 
































since Theorem 5.3.4 implies that 



must be a basis for this eigenspace. The following computations confirm that x is an eigenvector of A 
corresponding to \ = —j\ 



(ii) 


Since a number of our subsequent examples will involve 2x2 matrices with real entries, it will be useful to discuss some 
general results about the eigenvalues of such matrices. Observe first that the characteristic polynomial of the matrix 

'a b 


A = 


c d 


is 


det(A I-A) = 


A — a —b 
—c A — d 


= (\ — a)(X — d)-bc = \ 2 -(a + d)\±(ad-bc) 


We can express this in terms of the trace and determinant of A as 


det(AZ -A) = X 2 - \x(A) A + det(^) 


( 12 ) 


from which it follows that the characteristic equation of A is 

A 2 -tr(^)A + det(^) = 0 (13) 

Now recall from algebra that if ax'* I bx 4- c = 0 is a quadratic equation with real coefficients, then the discriminant 
b* — Aac determines the nature of the roots: 

2 

b — Aac > 0 [ Two distinct real roots ] 

b — Aac = 0 [ One repeated real root] 

2 

b — Aac < 0 [ Two conjugate imaginary roots ] 

Applying this to 13 with a = 1, b = — tr(-d), and c = det(-d) yields the following theorem. 

















Olga Taussky-Todd (1906-1995) 


Olga Taussky-Todd was one of the pioneering women in matrix analysis and the first woman 
appointed to the faculty at the California Institute of Technology. She worked at the National Physical Laboratory in 
London during World War II, where she was assigned to study flutter in supersonic aircraft. While there, she realized 
that some results about the eigenvalues of a certain 6x6 complex matrix could be used to answer key questions about 
the flutter problem that would otherwise have required laborious calculation. After World War II Olga Taussky-Todd 
continued her work on matrix-related subjects and helped to draw many known but disparate results about matrices 
into the coherent subject that we now call matrix theory. 

[Image: Courtesy of the Archives, California Institute of Technology ] 


THEOREM 5.3.5 

2 

If A is a 2 x 2 matrix with real entries, then the characteristic equation of A is A — tr(^4) A + det(^4) = 0 and 

(a) A has two distinct real eigenvalues if tr(-d) — 4 det(.4) > 0; 

(b) A has one repeated real eigenvalue if tr(.d) — 4 det(L4) = 0; 

(c) A has two complex conjugate eigenvalues if t r(A) — 4 det(y4) < 0. 


EXAMPLE 4 Eigenvalues of a 2 x 2 Matrix 


In each part, use Formula 13 for the characteristic equation to find the eigenvalues of 
2 2 " 

-1 5_ 

"0 -l' 

1 2 


< b >4l = 
< c > A = 


2 3 

-3 2 


Solution 

We have tr(A) = 7 and det(^4) = 12, so the characteristic equation of A is 

A 2 — 7A+ 12 = 0 

Factoring yields (A — 4) (A — 3) = 0, so the eigenvalues of A are A = 4 an d A = 3- 
We have tr(.d) = 2 and det(.d) = 1, so the characteristic equation of A is 

A 2 —2A+1 = 0 

2 

Factoring this equation yields (A — 1) = 0, so A = 1 is the only eigenvalue of ^4; it has algebraic 
multiplicity 2. 

We have tr(.d) = 4 and det(^4) = 13, so the characteristic equation of A is 

A 2 — 4A+ 13 = 0 

Solving this equation by the quadratic formula yields 




, 4 ± — 4) 2 — 4(13) 4 ± 36 

X ~ 2 "2 

Thus, the eigenvalues of A are \ = 2 + 3? and A = 2 — 3f 


2±3i 


Symmetric Matrices Have Real Eigenvalues 

Our next result, which is concerned with the eigenvalues of real symmetric matrices, is important in a wide variety of 
applications. The key to its proof is to think of a real symmetric matrix as a complex matrix whose entries have an imaginary 
part of zero. 


THEOREM 5.3.6 

If A is a real symmetric matrix, then A has real eigenvalues. 


Suppose that \ is an eigenvalue of A and x is a corresponding eigenvector, where we allow for the possibility that X is 
complex and x is in C M . Thus, 

Ax = Ax 

where x * 0- If we multiply both sides of this equation by x J and use the fact that 

Ax = x^(Ax) = A jx^xj = A(x • x) = A||x|| 2 

then we obtain 

y _ x^4x 

” M 2 

Since the denominator in this expression is real, we can prove that X is real by showing that 

x T Ax = x. T Ax: (14) 

But, A is symmetric and has real entries, so it follows from the second equality in 14 and properties of the conjugate that 
x^^lx = x T Ax = x T Ax = (^lxj 7 x= ^4 x) 7 x= (A-^x) T x = x T A T x = x T Ax 


A Geometric Interpretation of Complex Eigenvalues 

The following theorem is the key to understanding the geometric significance of complex eigenvalues of real 2x2 matrices. 


THEOREM 5.3.7 

The eigenvalues of the real matrix 



-i 


a 


(15) 











(16) 


are \ = a ± bi- If a and b are not both zero, then this matrix can be factored as 


'a -b 


>1 

0 ' 

cos$ —sinp 

b a 


0 


smd) cosd 


where cp is the angle from the positive x-axis to the ray that joins the origin to the point (a, b ) (Figure 5.3.2). 



Geometrically, this theorem states that multiplication by a matrix of form 15 can be viewed as a rotation through the angle cp 
followed by a scaling with factor |A| (Figure 5.3.3). 



9 9 

The characteristic equation of C is (A — a) A- b =0 (verify), from which it follows that the eigenvalues of C are 

A = a ± bi- Assuming that a and b are not both zero, let cp be the angle from the positive x-axis to the ray that joins the origin 
to the point ( a , b ). The angle cp is an argument of the eigenvalue A = a + bi, so we see from Figure 5.3.2 that 


a = |A|cos $ and b = |A|sin 6 
It follows from this that the matrix in 15 can be written as 


'a -b 


>1 o' 

1 

q|<~ 

1_ 


>1 o' 

cos<i —sin 6 

b a 


0 |A| 

b a 


0 |A| 

smd cos$ 




_w w 





The following theorem, whose proof is considered in the exercises, shows that every real 2x2 matrix with complex 
eigenvalues is similar to a matrix of form 15. 


THEOREM 5.3.8 

Let A be a real 2x2 matrix with complex eigenvalues A = a A. bi (where b 0)- If x is an eigenvector of A 
corresponding to \ = a—bi, then the matrix P = j^Re (x) Im(x) J is invertible and 























A = P 


a —b 
b a 


>-l 


(17) 


EXAMPLE 5 A Matrix Factorization Using Complex Eigenvalues 

Factor the matrix in Example 3 into form 17 using the eigenvalue \ = — j and the corresponding eigenvector 
that was given in 11. 

For consistency with the notation in Theorem 5.3.8, let us denote the eigenvector in 11 that 
corresponds to ,\ = — j by x (rather than x as before). For this X and x we have 


Thus, 


so A can be factored in form 17 as 



2' 


1 

a = 0, b= 1, Re(x) = 

5 

, Im(x) = 

5 


1 _ 


_ 0 



2 T 



P=[ Re(x) Im(x)] = 


5 5 

1 0 


-2 -1 

5 2 


2 _I 
’5 5 
1 0 


0 -1 

1 0 


0 1 

-5 -2 


You may want to confirm this by multiplying out the right side. 


A Geometric Interpretation of Theorem 5.3.8 


To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right side of 16 by S and R, h respectively, 
and then use 16 to rewrite 17 as 


A = PSR*P- 1 =P 


>1 

0 " 

cos$ 

— SITU?) 

0 

i A L 

sinp 

cos$ 


>-l 


(18) 


If we now view P as the transition matrix from the basis B = {Re (x), Im(x) } to the standard basis, then 18 tells us that 
computing a product Axq can be broken down into a three-step process: 

Step 1 Map xq from standard coordinates into ^-coordinates by forming the product p _1 X g. 

Step 2 Rotate and scale the vector p _1 Xn by forming the product SRaP~^X\ q- 

Step 3 Map the rotated and scaled vector back to standard coordinates to obtain = PSRaP^x q- 


Power Sequences 

There are many problems in which one is interested in how successive applications of a matrix transformation affect a specific 
vector. For example, if A is the standard matrix for an operator on R n and xq is some fixed vector in R n , then one might be 
interested in the behavior of the power sequence 

xq, j4xo> ^ 2 x 0 . A k \. Q ,... 






















For example, if 


1 2 

2 4 

2 li 

‘5 10 


A = 7 ^ 7 and xq = 

then with the help of a computer or calculator one can show that the first four terms in the power sequence are 



r 


"1.25" 

„2 

1 . 0 ' 

.3 

0.35' 

x 0 = 

i 

, 4lxo = 

0.5 

, A xq = 

- 0.2 

, XQ = 

-0.82 


With the help of MATLAB or a computer algebra system one can show that if the first 100 terms are plotted as ordered pairs 
(x,y), then the points move along the elliptical path shown in Figure 5.3.4a. 
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(b ) 

Figure 5.3.4 

To understand why the points move along an elliptical path, we will need to examine the eigenvalues and eigenvectors of A. 
We leave it for you to show that the eigenvalues of A are A = 4 i an< ^ corresponding eigenvectors are 

Al=y-|i: V 1 = (2 + *■ ! ) ^ A 2 = y + §* : v 2 = ("2 1 j 

If we take A = Aj = 7 - — yi and x = vi = ^ 4 - i, 1 j in 17 and use the fact that |A| = 1, then we obtain the factorization 


1 1 

2 4 

3 li 

'5 10 


I 1 
1 0 



'4 

3' 




5 

5 


0 1 

1 -1 


3 

4 


- 

5 

5 


2 _ 


(19) 


R* 


s-l 


where R ,. is a rotation about the origin through the angle 9 whose tangent is 


tan* 


sin n 


7T = f (*=“''f“ 369 ' 


cos $ 4, 

The matrix P in 19 is the transition matrix from the basis 

B= {Re(x),Im(x)> ={(± lj,(1.0) 

to the standard basis, and p ~* is the transition matrix from the standard basis to the basis B (Figure 5.3.5). Next, observe that 
if n is a positive integer, then 19 implies that 

A”x 0 = (PR$P _1 ) "xo = PR$P _1 x 0 

so the product ^ 4 ”xq can be computed by first mapping xq into the point p _1 Xo i n ^-coordinates, then multiplying by it” 1 ° 
rotate this point about the origin through the angle n6, and then multiplying /#P~*xo by P to map the resulting point back to 





























standard coordinates. We can now see what is happening geometrically: In 5-coordinates each successive multiplication by A 
causes the point to advance through an angle (p, thereby tracing a circular orbit about the origin. However, the basis B 

is skewed (not orthogonal), so when the points on the circular orbit are transformed back to standard coordinates, the effect is 
to distort the circular orbit into the elliptical orbit traced by A n x n (Figure 5.3.4ft). Here are the computations for the first step 
(successive steps are illustrated in Figure 5.3.4c): 


1 2 

2 4 [1 

_3 11 [l 
5 10 


0 1 
' 4 


[xq is mapped to B — coordinates . ] 


The point 11, — Jis rotated through the angle <f >. 


The point | y, 1 Jis mapped to standard coordinates . 



Concept Review 

Real part of z 
Imaginary part of z 
Modulus of z 
Complex conjugate of z 
Argument of z 
Polar form of z 
Complex vector space 
Complex rc-tuple 
Complex n-space 
Real matrix 
Complex matrix 

Complex Euclidean inner product 
Euclidean norm on C” 
































Antisymmetry property 
Complex eigenvalue 
Complex eigenvector 
Eigenspace in C” 

Discriminant 

Skills 

Find the real part, imaginary part, and complex conjugate of a complex matrix or vector. 

Find the determinant of a complex matrix. 

Find complex inner products and norms of complex vectors. 

Find the eigenvalues and bases for the eigenspaces of complex matrices. 

Factor a 2 x 2 real matrix with complex eigenvalues into a product of a scaling matrix and a rotation matrix. 


Exercise Set 5.3 


In Exercises 1-2, find u, Re(u), Im(u), and ||u||. 

1 . u= (2 — i, 4 i, 1+i) 

Answer: 

u=(2 + i, - 4 i, 1—0. Re (u) = (2, 0, 1), Im(u) = (-1.4, 1), ||u|| = ^23 

2 . u= (6, l+4i, 6-20 

In Exercises 3^1, show that u, v, and k satisfy Theorem 5.3.1. 

3 . u= (3 — 4i, 2 + z, — 6z), v= (1 + 2 , 2 — i, 4), k = i 

4 . u= (6, 1 + 4i, 6 — 20, v= (4, 3 + 2i, i — 3), k = —i 

5. Solve the equation ix — 3v = u for x, where u and v are the vectors in Exercise 3. 


Answer: 

x = (7 — 6i, _4 — 8i, 6 — 120 

6. Solve the equation (1 + Ox + 2u = v for x, where u and v are the vectors in Exercise 4. 
In Exercises 7-8, find A, Re (.4), Im(.4), det(.4), and tr(^4). 


j . A= r-5* 

[2-11 


4 

+5 i 


Answer: 

A = 


, Re (A) = 


0 4 
2 1 


, Im {A) = 


-5 0 
-1 5 


8 . 


5i 4 
2 4 -i 1 — 

4i 2-3 i 
2 + 3i 1 

9. Let A be the matrix given in Exercise 7, and let B be the matrix 

1 -i 


, det(j4) = 17 — i, tr(^4) = 1 


A = 


3 = 


2 i 













Confirm that these matrices have the properties stated in Theorem 5.3.2. 

10. Let A be the matrix given in Exercise 8 , and let B be the matrix 

B=\ 5l 

_1 -4i_ 

Confirm that these matrices have the properties stated in Theorem 5.3.2. 

In Exercises 11-12, compute u ■ v? u ■ and v • and show that the vectors satisfy Formula 5 and parts ( a ), ( b ), and ( c) 
of Theorem 5.3.3. 


11. u= (i, 2i, 3), v= (4, — 2 i, 1 + z)> w= (2 — i, 2 i, 5 4- 3 i), = 2i 

Answer: 

u • v = — 1 4-i, u • w= 18 —7z, v • w= 12 + 6i 

12. u= (1 4, 3i), v=(3, — 4i, 2 + 3i), w= (1 — i, 4i, 4 — 5z), 

13. Compute (u • v) — w ■ u for the vectors u, v, and w in Exercise 11. 


Answer: 

-11 - 14j_ 

14. Compute (m * w') + (||u||v) • u for the vectors u, v, and w in Exercise 12. 
In Exercises 15-18, find the eigenvalues and bases for the eigenspaces of A. 



Answer: 


Ai = 2 — i, xi = 



A 2 = 2 + i. 


xi = 


2 + i 

1 


Answer: 


A 1=4—2, xi = 


; A2 = 4+i, xi = 


1 + i 
1 


18. 


A = 


8 6 

“3 2 


In Exercises 19-22, each matrix C has form 15. Theorem 5.3.7 implies that C is the product of a scaling matrix with factor 
|A| and a rotation matrix with angle (p. Find |A| and cp for which —< <fi < tr. 


19. 


C = 


1 -1 

1 1 


Answer: 




20. c 


0 5 
-5 0 



21 . 


C = 


1 f3 
-f3 1 


Answer: 

W = 2.*=-f 

22 . r {2 {i 


c= 


-ft 1/2 I 

In Exercises 23-26, find an invertible matrix P and a matrix C of form 15 such that = PCP~^ ■ 


II 

rn 

"-1 

4 

Answer: 

p= 

Csl 

1 

II 

TT* 

fS 

'4 

1 

25 m= 

co ro 

1 

j 1 _ 1 

Answer: 

p= 

1 

-1 

p\ 

II 

■5 

1 


, c = 


3 -2 
2 3 


, C = 


5 -3 
3 5 


27. Find all complex scalars k, if any, for which u and v are orthogonal in ■ 

(a) u=(2i, i, 3i), v=(i, 6i, k) 

(b) u = (k, k, 1 + 2 ), v=(l, -1, 1-j) 


Answer: 


(a) k= _|; 


(b) None 

28. Show that if A is a real n x n matrix and x is a column vector in C M , then Re (^4x) = A (Re (x)) and Im(Ax) = ^4(Im(x)). 

29. The matrices 



"0 

r 


"0 -i 


"1 

o' 

ai = 

1 

0 

, a 2 — 

i 0 

> a 2 = 

0 

-1 


called Pauli spin matrices , are used in quantum mechanics to study particle spin. The Dirac matrices , which are also used 
in quantum mechanics, are expressed in terms of the Pauli spin matrices and the 2 x 2 identity matrix 12 as 

\h 0 

0 -h 


0 = 


a v 


0 <72 

<72 0 



' 0 

^1 

7 <** = 

<*\ 

0 

1 

' 0 

<73 

, a 'z = 

<73 

0 



(a) Show that 3 2 = a$ =ny =n%. 

(b) Matrices A and B for which AS = — SA are said to be anticommutative. Show that the Dirac matrices are 
anticommutative. 


30. If k is a real scalar and v is a vector in R n , then Theorem 3.2.1 states that ||£v|| = |£|||v||. Is this relationship also true if k 
is a complex scalar and v is a vector in C”? Justify your answer. 

31. Prove part ( c) of Theorem 5.3.1. 

32. Prove Theorem 5.3.2. 

33. Prove that if u and v are vectors in C”, then 

u-v= i||u + v|| 2 -i||u-v|| 2 

+ I-|| u + J v|| 2 -|||u- J v|| 2 


34. It follows from Theorem 5.3.7 that the eigenvalues of the rotation matrix 


R* = 


cos$ 

sin® 


— sin® 
cos$ 


are A = costb ± zsinri. Prove that if x is an eigenvector corresponding to either eigenvalue, then Re(x) and Im(x) are 
orthogonal and have the same length. [Note: This implies that P = [Re(x)Im(x) ] is a real scalar multiple of an 
orthogonal matrix.] 


35. The two parts of this exercise lead you through a proof of Theorem 5.3.8. 
(a) For notational simplicity, let 



and let u = Re (x) and v = Im(x), so P = [u|v]. Show that the relationship Ax = Ax implies that 

Ax = (an 4- bv ) 4= i ( — bn 4 - av ) 

and then equate real and imaginary parts in this equation to show that 

AP = [Au\Av] = [an + bv | — bn 4 - av] = PM 

(b) Show that P is invertible, thereby completing the proof, since the result in part (a) implies that A = PMP -1 • [Hint: If 
P is not invertible, then one of its column vectors is a real scalar multiple of the other, say v = cu- Substitute this into 
the equations Ai = an + bv and Av = — bn 4- av obtained in part (a), and show that (1 + c )&u = 0 . Finally, show 
that this leads to a contradiction, thereby proving that P is invertible.] 


36. In this problem you will prove the complex analog of the Cauchy-Schwarz inequality. 

(a) Prove: If k is a complex number, and u and v are vectors in C”, then 

(u — &v) • (u — kv) — n ■ u — k(n • v) — k(n • v) + ££(v • v) 


(b) Use the result in part (a) to prove that 

0 < u • u - k(n • v) — fc(u • v) + kk(v • v) 


(c) Take k = (u • v) / (v • v) in part (b) to prove that 

|u-v|<||u|| ||v|| 


True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer, 
(a) There is a real 5x5 matrix with no real eigenvalues. 








Answer: 


False 

(b) The eigenvalues of a 2 x 2 complex matrix are the solutions of the equation A — tr(^4)A + det(^4) = 0. 

Answer: 

True 

(c) Matrices that have the same complex eigenvalues with the same algebraic multiplicities have the same trace. 
Answer: 

False 

(d) If A is a complex eigenvalue of a real matrix A with a corresponding complex eigenvector v, then A is a complex 
eigenvalue of A and v is a complex eigenvector of A corresponding to A- 

Answer: 

True 

(e) Every eigenvalue of a complex symmetric matrix is real. 

Answer: 

False 

(f) If a 2 x 2 real matrix^ has complex eigenvalues and xq is a vector in then the vectors xq, Ax q, A^x q, A n x o, 

on an ellipse. 

Answer: 

False 
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5.4 Differential Equations 

Many laws of physics, chemistry, biology, engineering, and economics are described in terms of “differential 
equations”—that is, equations involving functions and their derivatives. In this section we will illustrate one way in 
which linear algebra, eigenvalues and eigenvectors can be applied to solving systems of differential equations. 
Calculus is a prerequisite for this section. 


Terminology 

Recall from calculus that a differential equation is an equation involving unknown functions and their derivatives. 
The order of a differential equation is the order of the highest derivative it contains. The simplest differential 
equations are the first-order equations of the form 


y' = ay (1) 

where y = f (x) is an unknown differentiable function to be determined, y =dy f dx is its derivative, and a is a 
constant. As with most differential equations, this equation has infinitely many solutions; they are the functions of the 
form 


y = (2) 

where c is an arbitrary constant. That every function of this form is a solution of 1 follows from the computation 

y f = cae ax = ay 

and that these are the only solution is shown in the exercises. Accordingly, we call 2 the general solution of 1. As an 
example, the general solution of the differential equation y * = 5 y is 

y = ce 5x (3) 

Often, a physical problem that leads to a differential equation imposes some conditions that enable us to isolate one 
particular solution from the general solution. For example, if we require that solution 3 of the equation y = 5 y 
satisfy the added condition 


y(0) = 6 (4) 

(that is, y = 6 when x = 0), then on substituting these values in 3, we obtain 6 = ce~ = c, from which we conclude 
that 

is the only solution y 1 = 5 y that satisfies 4. 

A condition such as 4, which specifies the value of the general solution at a point is called an initial condition , and 
the problem of solving a differential equation subject to an initial condition is called an initial-value problem. 


First-Order Linear Systems 


In this section we will be concerned with solving systems of differential equations of the form 

y' i = aiLVi + a\iyi +—+ ainy»i 

y' 2 = a 2 \y\ + “vyi +--+ <*2 *7m 

= <*m171 + <*m 272 +—+ «nny« 

where y j = /1 (x) > y 2 = / 2 (*) > - - -> 7m = / m 00 are functions to be determined, and the a ij's are constants. In 
matrix notation, 5 can be written as 

[j 1 . 

71 

72 

7m 


( 5 ) 


Vf 


'an 

a \2 

... a\ n 

y' 2 

= 

«21 

<*22 

--- a 2Yl 

y’n 


a n 1 

a n2 

--- ^nn 


or, more briefly as 


A system of differential equations of form 5 is 
called a first-order linear system. 


y’ = Ay 

where the notation y r denotes the vector obtained by differentiating each component of y. 


( 6 ) 


EXAMPLE 1 Solution of a Linear System with Initial Conditions 

Write the following system in matrix form: 


7*1 = 

371 


y' 2 = 

-272 

(V) 

y' 3 = 

573 



(b) Solve the system. 

Find a solution of the system that satisfies the initial conditions y\(0) = 1 ? y 2 (0) = 4> anc ^ 
73(0)=-2- 


Solution 

(a) 


or 


7i 


'3 0 O' 

71 

72 

= 

0-2 0 

72 



0 0 5 

73 

73 





y = 


3 0 0 
0-2 0 
0 0 5 


( 8 ) 


(9) 
















Because each equation in 7 involves only one unknown function, we can solve the equations 
individually. It follows from 2 that these solutions are 


or, in matrix notation, 


y = 


y\ 

= 

_ 3x 
c\e 

yi 

= 

—2x 

c 2 # 

y2 

= 


'yf 


" c { e 3x ~ 

y2 

= 

_ —2x 

c 2 # 

y3 


5x 



c 


From the given initial conditions, we obtain 

1 = J»i(0) =cie° = ci 

4 = >> 2 ( 0 ) =C2e Q =c 2 
-2 = J3(0) =cie°=C2 
so the solution satisfying these conditions is 

Vl=e 3 ', 72 = 4* _2 \ «=- 2e 5 ' 

or, in matrix notation, 



'yf 


' 

y = 

y2 

= 

4e~ 2x 


y3 


-2e 5x 


( 10 ) 


Solution by Diagonalization 


What made the system in Example 1 easy to solve was the fact that each equation involved only one of the unknown 
functions, so its matrix formulation, y' = Ay, had a diagonal coefficient matrix A [Formula 9]. A more complicated 
situation occurs when some or all of the equations in the system involve more than one of the unknown functions, for 
in this case the coefficient matrix is not diagonal. Let us now consider how we might solve such a system. 


The basic idea for solving a system y' = Ay whose coefficient matrix A is not diagonal is to introduce a new 
unknown vector u that is related to the unknown vector y by an equation of the form y = Pn in which P is an 
invertible matrix that diagonalizes A. Of course, such a matrix may or may not exist, but if it does then we can rewrite 
the equation y ; = Ay as 

Pu = A(Pu) 


or alternatively as 

u '=fp- i Apy 

Since P is assumed to diagonalize A, this equation has the form 

= Du. 










where D is diagonal. We can now solve this equation for u using the method of Example 1, and then obtain y by 
matrix multiplication using the relationship y = Pn. 

In summary, we have the following procedure for solving a system y ; = Ay in the case were A is diagonalizable. 

r 


A Procedure for Solving y f = Ay if A is Diagonalizable 

Step 1. Find a matrix P that diagonalizes A. 

Step 2. Make the substitutions y = Pu and y* = Pm' to obtain a new “diagonal system” u r = where 

D = P~ l AP- 

Step 3. Solve u ' — £)q. 

Step 4. Determine y from the equation y = Pn. 


EXAMPLE 2 Solution Using Diagonalization 

(a) Solve the system 

y{ = y\ + yi 

y'2 = 4 y 1 - 2^2 

Find the solution that satisfies the initial conditions y j (0) = 1, y2(0) = 6 - 


Solution 

The coefficient matrix for the system is 


A = 


1 1 

4 -2 


As discussed in Section 5.2, A will be diagonalized by any matrix P whose colu m ns are linearly 
independent eigenvectors of A. Since 

A— 1 -1 


det(A/ — A) = 


-4 A+2 

the eigenvalues of A are A = 2 and \= — 3 . By definition 


= A + A — 6 = (A + 3) (A — 2) 


x = 


*1 

*2 


is an eigenvector of A corresponding to A if and only if x is a nontrivial solution of 

A — 1 -1 
w -4 A + 2 

If A= 2> this system becomes 



■*l' 


'o' 


/2_ 


_0_ 


1 

-V 

■*l' 


0" 

-4 

4 _ 

x 2_ 


_0_ 


Solving this system yields x \ = t, *2 = £, so 




















Thus, 



PI = 


is a basis for the eigenspace corresponding to ,\ = 2- Similarly, you can show that 

P2= 4 

1 

is a basis for the eigenspace corresponding to ,\ = _ 3 . Thus, 

P = 


> 4 
1 


1 


diagonalizes A, and 


D = P~ X AP = 


2 0 
0 -3 


Thus, as noted in Step 2 of the procedure stated above, the substitution 

y = Ai and y* = Ai r 

yields the “diagonal system” 

2 0 1 u \ = 2w l 

„ -U or , 

0 —3 J u *2 = -3u 2 


u =Du = 


From 2 the solution of this system is 


«1 


— c\e 


2x 


—3x 


or u = 


U 2 =c 

so the equation y = Ai yields, as the solution for y, 


y = 


or 


c\e 


2x 


c 2 & 


—3x 





r 

4 

2x 


2x 1 —3* 

yr 

_ 

l 

c\e 

_ 

c \ e ~ ~^ c 2 e 

y 2 


_i 

i 

i 

* 

7 

r-4 

_i 


c l e 2 x +c 2 e~ 3x 


y\ = c\e 2%3x 

2x _ 3x 

y 2 = c\e + c 2 e 


( 11 ) 


If we substitute the given initial conditions in 11, we obtain 

ci-^2=1 

c\ + c 2 = 6 

Solving this system, we obtain ci = 2, C2 = 4, so it follows from 11 that the solution satisfying 
the initial conditions is 

y i = 2e — e 


^2 = 2 e 


2x 


4& 


—3x 


























Keep in mind that the method of Example 2 works because the coefficient matrix of the system can be 
diagonalized. In cases where this is not so, other methods are required. These are typically discussed in books 
devoted to differential equations. 


Concept Review 

Differential equation 
Order of a differential equation 
General solution 
Particular solution 
Initial condition 
Initial-value problem 
First-order linear system 

Skills 

Find the matrix form of a system of linear differential equations. 

Find the general solution of a system of linear differential equations by diagonalization. 

Find the particular solution of a system of linear differential equations satisfying an initial condition. 


Exercise Set 5.4 

* • (a) Solve the system 

y[ = y\ + 4y 2 

y' 2 = 2 yi + 3 y 2 

(b) Find the solution that satisfies the initial conditions y ^ (0) = 0, y2(0) = 0- 


Answer: 


(a) 

y i = 

c\e^ x — 2c2& 


72 = 

5x —j 

■■c\e JX + 

(b) 

71 = 

0 


72 = 

0 

(a) 

Solve 

; the system 


y[= y i + 3y 2 
y 2 = 4 y i + 5^2 

(b) Find the solution that satisfies the conditions y j (0) = 2, y 2 (0) = 1- 
(a) Solve the system 


y[ = 

4y i 

+ 

73 

y 2 = 

-2yi 

+ 72 


y r 3 = 

-2yi 

+ 

73 


(b) Find the solution that satisfies the initial conditions yj( 0 ) = — 1 , y 2 (0) — 1 ’ >' 3 ( 0 ) — 0 - 
Answer: 

( a ) yi = — c 2 e AX 4- c 3 e 3x 
72 = c l gX + 2c 2 e 7x — c 3 e 3x 
y3 = 2 c 2 e 7x — C 30 3x 

(b) y 1=e 2x - 2e 3x 
y 2 = e x - 2e 2x + 2e 3x 
y 3 = -2e 7x + 2e 3x 

4. Solve the system 

y{ = 4 y\ + 2y 2 + 273 
y 2 = 2 yi +4y 2 + 2y 3 
73 = 2y! + 2y 2 + 4y 3 

5. Show that every solution of y f = ay has the form y = ce ax • 

[Hint: Let y = f ( x ) be a solution of the equation, and show that / ( x)e ax is constant.] 

6 . Show that if ^ is diagonalizable and 

71 

72 

7w 

is a solution of the system y* = Ay , then each is a linear combination of e^ x r e^ x ,..e^ nX , where 
A 2 ,A„ are the eigenvalues of ,4. 

7. Sometimes it is possible to solve a single higher-order linear differential equation with constant coefficients by 
expressing it as a system and applying the methods of this section. For the differential equation y n — y r — 6y = 0 
, show that the substitutions y\=y and y 2 = y f lead to the system 

y[ = y 2 
y 2 = &y 1 +y 2 

Solve this system, and use the result to solve the original differential equation. 




Answer: 


2 x — 7 x 

y=c\e* x +c 2 & 

8. Use the procedure in Exercise 7 to solve y n +y f — \2y = 0. 

9. Explain how you might use the procedure in Exercise 7 to solve y tn — 6y ^ + 1 ly* — 67 = 0. Use your 
procedure to solve the equation. 


Answer: 

y = c\e x + C 2 ^ x + cje^ x 

1®’ (a) By rewriting 11 in matrix form, show that the solution of the system in Example 2 can be expressed as 


2 x 

y = ci«r 

'f 

1 


1' 
4 




1 


This is called the general solution of the system. 

(b) Note that in part (a), the vector in the first term is an eigenvector corresponding to the eigenvalue Aj = 2, and 
the vector in the second term is an eigenvector corresponding to the eigenvalue A2 = — 3 . This is a special 
case of the following general result: 


Theorem. If the coefficient matrix A of the system y r = Ay is diagonalizable, then the general 
solution of the system can be expressed as 

y = ci«? Al *xi + c 2 <? A2 *x 2 + ... + c„e XnX x n 

where Aj, A2, X n are the eigenvalues of A, and X; is an eigenvector of A corresponding to A; . 


Prove this result by tracing through the four-step procedure preceding Example 2 with 

'Ai 0 ... 0 

0 A 2 ... 0 


D = 


0 0 


A„ 


andP= [xi|x 2 |..jc m ] 


j 


11. Consider the system of differential equations y ; = Ay, where A is a 2 x 2 matrix. For what values of 

^n, ^12, a 21> a 22 do the component solutions y i (t), ^2(0 ten ^ to zero as t —► 00? I n particular, what must be 
true about the determinant and the trace of A for this to happen? 

12. Solve the nondiagonalizable system 

y [ = y 1 + 72 

72 = 72 


True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer, 
(a) Every system of differential equations y* = Ay has a solution. 








Answer: 


False 

(b) If x' = Ax and y* = Ay, then x = y. 

Answer: 

False 

(c) If x f = Ax. and y r = Ay, then (cx + dy)* = A(cx + dy) for all scalars c and d. 

Answer: 

True 

(d) If A is a square matrix with distinct real eigenvalues, then it is possible to solve s! = Ax by diagonalization. 
Answer: 

True 

(e) If A and P are similar matrices, then y f = Ay and = Ai have the same solutions. 

Answer: 

False 
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Supplementary Exercises 


(a) Show that if 0 < 9 < then 


(b) 


A = 


COS# 

sin# 


has no eigenvalues and consequently no eigenvectors. 
Give a geometric explanation of the result in part (a). 


—sin# 

cos# 


Answer: 


(b) The transformation rotates vectors through the angle #; therefore, if 0 < # < jt, then no nonzero vector 
is transformed into a vector in the same or opposite direction. 


2. Find the eigenvalues of 



1 0 
0 1 

-3k 2 3k 


3 - (a) 


Show that if D is a diagonal matrix with nonnegative entries on the main diagonal, then there is a 
matrix S such that = £). 


(b) Show that if A is a diagonalizable matrix with nonnegative eigenvalues, then there is a matrix S such 
that S 2 = A- 


(c) Find a matrix S such that S 2 = A, given that 


A = 


1 3 
0 4 
0 0 


1 

5 

9 


Answer: 


(c) 


1 

0 

0 


1 0 
2 1 
0 3 


4. Prove: If A is a square matrix, then A and A ” have the same characteristic polynomial. 

5. Prove: If A is a square matrix and p (A) = det(A/ — A) is the characteristic polynomial of A, then the 
coefficient ofA” -1 in^)(A) is the negative of the trace of A. 

6. Prove: If £. ^ 0, then 

A = \ a b 
[O a_ 

is not diagonalizable. 

7. In advanced linear algebra, one proves the Cayley—Hamilton Theorem, which states that a square matrix 












A satisfies its characteristic equation; that is, if 

cq + + C2 +... + Cn-\ A M * + A M = 0 

is the characteristic equation of A, then 

Cq/ -\~ C\A + C2^ + ... + Cyi— 1-^ 1 = 0 


Verify this result for 



'3 6" 


0 1 0 

(a) A = 

1 2 

(b) A = 

0 0 1 

1 -3 3 


In Exercises 8-10, use the Cayley—Hamilton Theorem, stated in Exercise 7. 


(a) Use Exercise 18 of Section 5.1 to prove the Cayley—Hamilton Theorem for 2 x 2 matrices. 

(b) Prove the Cayley—Hamilton Theorem for n x n diagonalizable matrices. 


9. The Cayley—Hamilton Theorem provides a method for calculating powers of a matrix. For example, if A 
is a 2 x 2 matrix with characteristic equation 


cq + c^A + A^ = 0 


then C q/ + cj.(4 + j4 2 = 0> so 

A^ = — c\A — cqI 

Multiplying through by A yields — _ c\A l — cqA> which expresses A 1 ' i n terms of A 1 and A, and 
multiplying through by A 1 yields A 4 = —c\A 4 — cqA^-> which expresses A 4 i n terms of A' and A 1 - 
Continuing in this way, we can calculate successive powers of A by expressing them in terms of lower 
powers. Use this procedure to calculate A*, A^, A 4 , and A~' for 


A = 


3 6 
1 2 


Answer: 


r 15 30" 

, ^ 3 = 

'75 150' 

, ^ 4 = 

"375 750" 

, = 

"1875 3750 

1— 

Ul 

o 


25 50 _ 


125 250_ 


_ 625 1250 


10. Use the method of the preceding exercise to calculate A? and A 4 for 


A = 


0 1 0 
0 0 1 
1 -3 3 


11. Find the eigenvalues of the matrix 


ci C2 ... c„ 
Cl C2 ... c„ 

ci C2 ... c„ 


Answer: 



0, tr(^) 

(a) It was shown in Exercise 17 of Section 5.1 that if A is an n x n matrix, then the coefficient of \ n in 
the characteristic polynomial of A is 1. (A polynomial with this property is called monic.) Show that 
the matrix 

0 0 0 ... 0 —eg 

1 0 0 ... 0 -ci 

0 1 0 ... 0 —c 2 

0 0 0 ... 1 —c M _i 

has characteristic polynomial 

£>(A) = eg + ciA +... + c M _iA M ^ + A M 

This shows that every monic polynomial is the characteristic polynomial of some matrix. The matrix 
in this example is called the companion matrix of p (A). [Hint: Evaluate all determinants in the 
problem by adding a multiple of the second row to the first to introduce a zero at the top of the first 
column, and then expanding by cofactors along the first column.] 

(b) Find a matrix with characteristic polynomial 

£>(A) = 1 — 2A +A 2 + 3A 3 + A 4 

13. A square matrix A is called nilpotent if A n = 0 for some positive integer n. What can you say about the 
eigenvalues of a nilpotent matrix? 

Answer: 


They are all 0. 

14. Prove: If A is an « x « matrix and n is odd, then A has at least one real eigenvalue. 

15. Find a 3 x 3 matrix A that has eigenvalues A = 0, 1, and _ ] with corresponding eigenvectors 


0" 


r 


'o' 

1 

7 

-1 

7 

1 

-1 


1 


1 


respectively. 


Answer: 


1 0 0 



16. Suppose that a 4 x 4 matrix A has eigenvalues Aj = 1, A 2 = — 2, A 3 = 3, and A 4 = — 3. 

(a) Use the method of Exercise 16 of Section 5.1 to find det(y4). 

(b) Use Exercise 5 above to find tr(^4). 

17. Let A be a square matrix such that = A- What can you say about the eigenvalues of A? 



Answer: 


They are all 0, 1, or _ ]. 

(a) Solve the system 

y[ = 71 + 3^2 

y' 2 = 2yi+4y 2 

(b) Find the solution satisfying the initial conditions y ^ (0) = 5 and y?(0) = 6- 
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Inner Product Spaces 


CHAPTER CONTENTS 

Inner Products 

Angle and Orthogonality in Inner Product Spaces 
Gram-Schmidt Process; ^-Decomposition 
Best Approximation; Least Squares 
Least Squares Fitting to Data 
Function Approximation; Fourier Series 


INTRODUCTION 

In Chapter 3 we defined the dot product of vectors in R n , and we used that concept to 
define notions of length, angle, distance, and orthogonality. In this chapter we will 
generalize those ideas so they are applicable in any vector space, not just R n . We will also 
discuss various applications of these ideas. 
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6.1 Inner Products 

In this section we will use the most important properties of the dot product on R n as axioms, which, if satisfied by the vectors 
in a vector space V, will enable us to extend the notions of length, distance, angle, and perpendicularity to general vector 
spaces. 


General Inner Products 

In Definition 4 of Section 3.2 we defined the dot product of two vectors in R n , and in Theorem 3.2.2 we listed four 
fundamental properties of such products. Our first goal in this section is to extend the notion of a dot product to general real 
vector spaces by using those four properties as axioms. We make the following definition. 

Note that Definition 1 applies only to real vector 
spaces. A definition of inner products on complex 
vector spaces is given in the exercises. Since we will 
have little need for complex vector spaces from this 
point on, you can assume that all vector spaces under 
discussion are real, even though some of the theorems 
are also valid in complex vector spaces. 


1 


DEFINITION 1 

An inner product on a real vector space Vis a function that associates a real number (u, vj with each pair of vectors in 
V in such a way that the following axioms are satisfied for all vectors u, v, and w in V and all scalars k. 

1. fu, vj = (v, u\ [Symmetry axiom] 

2. (u 4- v, wj = (u, wj -H (v, wj [Additivity axiom] 

3. l\hi, vj = k(u, vj [Homogeneity axiom] 

4. (v, v} > 0 and (v, vj = 0 if and only if v = 0 [Positivity axiom] 

A real vector space with an inner product is called a real product space. 

L J 

Because the axioms for a real inner product space are based on properties of the dot product, these inner product space axioms 
will be satisfied automatically if we define the inner product of two vectors u and v in R n to be 

(u, vj = u - v = u\v\ 4= U 2 V 2 4= — 4- u„v M 

This inner product is commonly called the Euclidean inner product (or the standard inner product ) on R n to distinguish it 
from other possible inner products that might be defined on R n . We call R n with the Euclidean inner product Euclidean 
n-space. 

Inner products can be used to define notions of norm and distance in a general inner product space just as we did with dot 
products in R n . Recall from Formulas 11 and 19 of Section 3.2 that if u and v are vectors in Euclidean / 2 -space, then norm and 
distance can be expressed in terms of the dot product as 

||v|| = |/v v and d(u, v) = ||u — v|| = ^(u — v) - (u — v) 

Motivated by these formulas we make the following definition. 

r n 




DEFINITION 2 


If Kis a real inner product space, then the norm (or length) of a vector v in V is denoted by ||v|| and is defined by 

IMI = i/(v, v} 

and the distance between two vectors is denoted by d (u, v) and is defined by 

d (u, v) = 11u — v|| = — v, u- vj 

A vector of norm 1 is called a unit vector. 

J 


The following theorem, which we state without proof, shows that norms and distances in real inner product spaces have many 
of the properties that you might expect. 


THEOREM 6.1.1 

If u and v are vectors in a real inner product space V, and if k is a scalar, then: 

(a) ||v|| >0 with equality if and only if v = 0- 

(b) ||*v|| = |*|||v||. 

(c) £3?(ll, V) =^(V, U). 

(d) d (u, v) > 0 with equality if and only if u = y. 


Although the Euclidean inner product is the most important inner product on R n , there are various applications in which it is 
desirable to modify it by weighting each term differently. More precisely, if 

are positive real numbers, which we will call weights , and if u = (u\,u 2 , u n ) and v = (vj, V 2 ,v M ) are vectors in R n , 
then it can be shown that the formula 

(u, v} = w\u\v\ + W2 U 2 V 2 + " ’ ‘ (1) 

defines an inner product on R n that we call the weighted Euclidean inner product with weights w 1 , W 2 , - 

Note that the standard Euclidean inner product is the 
special case of the weighted Euclidean inner product in 
which all the weights are 1. 


EXAMPLE 1 Weighted Euclidean Inner Product 

Let u = (u\,tt2) and v = (vi, V 2 ) be vectors in £ 2 . Verify that the weighted Euclidean inner product 

(u, v) = 3u\v\ + 2^2V2 (2) 


satisfies the four inner product axioms. 


Axiom 1: Interchanging u and v in Formula 2 does not change the sum on the right side, so 




(u, v} = (v, u}. 


Axiom 2: If w= (w\, W 2 ), then 

(u -H v, wj 


3(u\ + vi)wi + 2(^2 + V2 )m?2 
3(u\w\ +viwi) 4° 2(^2 w 2 “H V 2 W 2 ) 
(3w 1 w 1 4- 2«2> 1? 2) + (3v iw 1 4 s 2 v2M?2) 
{ll, wj + (jv, wj 


Axiom 3: 

{iu, vj = 3(i«i)vi + 2(te2)v2 
= k{3u\v\^-2ii2V2) 

= *{u, v) 

Axiom 4: {v, vj = 3(vjvi) + 2(v2V2) = 3v^ 4- 2v| > 0 with equality if and only if vi = V 2 = 0; that is, if 
and only if v = 0 • 

In Example 1, we are using subscripted w's to 
denote the components of thevector w, not the 
weights. The weights are the numbers 3 and 2 in 
Formula 2. 


An Application of Weighted Euclidean Inner Products 

To illustrate one way in which a weighted Euclidean inner product can arise, suppose that some physical experiment has n 
possible numerical outcomes 

x 2 . x n 

and that a series of m repetitions of the experiment yields these values with various frequencies. Specifically, suppose that x 1 
occurs /1 times, *2 occurs / 2 times, and so forth. Since there are a total of m repetitions of the experiment, it follows that 

/l +/2+ • ' • +/n = m 

Thus, the arithmetic average of the observed numerical values (denoted by x) is 

F _ /l*l +/2*2+ • • • - 1 /f, r , , /, r , r \ , 

If we let 

f = (/1./2./») 

x = (Jri. JC2.*n) 

m ?1 = m ?2 = ...= w n = 1 / m 

then 3 can be expressed as the weighted Euclidean inner product 

x = ( f ,x} = wi/ 1 xi+w 2 / 2 *2+ ' • • +w„/„x„ 


EXAMPLE 2 Using a Weighted Euclidean Inner Product 



It is important to keep in mind that norm and distance depend on the inner product being used. If the inner product 
is changed, then the norms and distances between vectors also change. For example, for the vectors u = (1, 0) and 
v = (0, 1) in p} with the Euclidean inner product we have 

l|u|| = /l 2 + 0 2 =l 

and 

i(u,v) = ||u-v|| = ||(l, — 1)11 =/l 2 + ( — 1 ) 2 =/^ 
but if we change to the weighted Euclidean inner product 

{u, vj = 3 u\v\ + 2u2V2 


we have 

INI = (u, u } 1/2 = [3(1)0) + 2(0)(0 )] 1/2 = 
and 

d{ u.v) = ||u-v||=((l, - 1 ), ( 1 , - 1)} 1/2 

= [3(l)(l) + 2(-l)(-l )] 1/2 = /5 


Unit Circles and Spheres in Inner Product Spaces 

If V is an inner product space, then the set of points in V that satisfy 

Hull = i 

is called the unit sphere or sometimes the unit circle in V. 

EXAMPLE 3 Unusual Unit Circles in R 2 

Sketch the unit circle in an xy-coordinate system in R 2 using the Euclidean inner product 
(u, v} = u\v\ +1*2^2- 

Sketch the unit circle in an xy-coordinate system in R 2 using the weighted Euclidean inner product 

U,V =I W lVl + la 2 V 2 . 


Solution 

) If u = (x, y ) ? then ||u|| = (u, u } 1/2 = "y 2 , so the equation of the unit circle is d-y 2 = °b on 

squaring both sides, 


As expected, the graph of this equation is a circle of radius 1 centered at the origin (Figure 6.1.1 a). 
If u = (x, y ), then ||u|| = (u, u }^ 2 = ^ -i * 2 4 = J-y 2 , so the equation of the unit circle is 


i-x 2 4 = ^-y 2 = 1 , or, on squaring both sides, 


x 2 y 2 

— -h — = 1 
9 4 


The graph of this equation is the ellipse shown in Figure 6 .1.1 A 









IMI =1 


-t 


(tf) The unit circle using 
the standard Euclidean 
inner product. 



(b) The unit circle using 
a weighted Euclidean 
inner product. 

Figure 6.1.1 


It may seem odd that the “unit circle” in the second part of the last example turned out to have an elliptical shape. 
This will make more sense if you think of circles and spheres in general vector spaces algebraically (||u|| = 1) rather than 
geometrically. The change in geometry occurs because the norm, not being Euclidean, has the effect of distorting the space that 
we are used to seeing through “Euclidean eyes.” 


Inner Products Generated by Matrices 

The Euclidean inner product and the weighted Euclidean inner products are special cases of a general class of inner products 
on R n called matrix inner products . To define this class of inner products, let u and v be vectors in R n that are expressed in 
column form , and let A be an avertible nxn matrix. It can be shown (Exercise 31) that if u - v is the Euclidean inner product 
on R n , then the formula 


(u, v} = An • Av 


( 4 ) 


also defines an inner product; it is called the inner product on R n generated by A. 

Recall from Table 1 of Section 3.2 that if u and v are in column form, then u • v can be written as v ^u f rom which it follows 
that 4 can be expressed as 

ju, vj = (yiv) 7 '^u 


or, equivalently as 






u,v) = v r ^ r J 4u 


( 5 ) 


EXAMPLE 4 Matrices Generating Weighted Euclidean Inner Products 

The standard Euclidean and weighted Euclidean inner products are examples of matrix inner products. The 
standard Euclidean inner product on R n is generated by the ^ x n identity matrix, since setting = / in Formula 
4 yields 

(u, vj = In ■ /v = u • v 

and the weighted Euclidean inner product 

{u, vj = w\u\v i + W2«2i 

is generated by the matrix 

ijw\ 0 

A _ 0 \fw2 
0 0 

This can be seen by first observing that A 7 A is the nxn diagonal matrix whose diagonal entries are the weights 
wj, v>2 .Wji and then observing that 5 simplifies to 6 when A is the matrix in Formula 7. 


v 2 + • • • 


( 6 ) 


0 ... 0 

0 ... 0 

0 ... 


(7) 


EXAMPLE 5 Example 1 Revisited 


Every diagonal matrix with positive diagonal 
entries generates a weighted inner product. 
Why? 


The weighted Euclidean inner product (u, vj = 3u\v\ { 2^2 V 2 discussed in Example 1 is the inner product on 
R 2 generated by 


{3 0 
0 {l 


Other Examples of Inner Products 

So far, we have only considered examples of inner products on R n . We will now consider examples of inner products on some 
of the other kinds of vector spaces that we discussed earlier. 

EXAMPLE 6 An Inner Product on M n n 


If U and V are ^ x n matrices, then the formula 






(8) 


(Z7. r|=tr(y r r) 


defines an inner product on the vector space M nn (see Definition 8 of Section 1.3 for a definition of trace). This 
can be proved by confirming that the four inner product space axioms are satisfied, but you can visualize why 
this is so by computing 8 for the 2 x 2 matrices 


U = 


a 1 

U2 


U 4 


and 



v 2 

v 4 


This yields 


Jy, rj = tr(t/ r r) 


= WlVi +« 2 V 2 +U3V3 + W4V4 


which is just the dot product of the corresponding entries in the two matrices. For example, if 


U = 


1 2 
3 4 


and V = 


-1 0 

3 2 


then 

(U,V) = !(-!) + 2(0) + 3(3) +4(2) = 16 


The norm of a matrix U relative to this inner product is 

II U\\ ={U,U) U2 = ^2 + m 2 + u 2 +w 2 


EXAMPLE 7 The Standard Inner Product on P n 

if 

p =<20 +^1* + - + and q = &o + b\x + ■ - ■ -\-b n x n 

are polynomials in P n , then the following formula defines an inner product on P n (verify) that we will call the 
standard inner product on this space: 

(P. q} = ^0^0 + + ■ • • -¥a n b n ( 9 ) 

The norm of a polynomial p relative to this inner product is 

IIpII = /{p,p} = 


EXAMPLE 8 The Evaluation Inner Product on P n 

if 

p = p(x) =ao + aix + • • • +(2„x” and q= q(x) =bo+ bix+ • • • +b n x n 
are polynomials in P n , and if xq, xj, .... x n are distinct real numbers (called sample points), then the formula 

(P- = P(xo)q(x Q ) +p(x\)q(x\) + • • • + p(x„)q(x„) (10) 

defines an inner product on P n called the evaluation inner product at xq, *i, Algebraically, this can be 

viewed as the dot product in R n of the ^-tuples 

(p(xq),p(xi) . p(x»)) and (<?(x 0 ), q(x\) . q(x»)) 

and hence the first three inner product axioms follow from properties of the dot product. The fourth inner 
product axiom follows from the fact that 













jp,p}= b(*o)] 2 + [/>(*i)] 2 + • • • + bO M )] 2 >0 

with equality holding if and only if 

?(*o) =J>(*l) = - = .POw) = 0 

But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must be that p = 0 , which 
proves that the fourth inner product axiom holds. 

The norm of a polynomial p relative to the evaluation inner product is 

llpll = /(P’P} = ^[/ > ( jr o)] 2 + b(*i)] 2 + • • • + b(*„)] 2 ( n ) 


EXAMPLE 9 Working with the Evaluation Inner Product 

Let P 2 have the evaluation inner product at the points 

*0 = —2, *i = 0, andx2 = 2 

Compute (p, qj and ||p|| for the polynomials p = p(x) = x and q = q(x) = 1 4- x. 


It follows from 10 and 11 that 

{P. q} = P( - 2 M - 2) + p(PM0) +p(2)q(2) = (4) (- 1) + (0) (1) + (4) (3) = 8 

IIpII = /[K*o)] 2 +[/K*l)] 2 +[/K* 2 )] 2 = /[^(-2)] 2 +b(0)] 2 +b(2)] 2 

= l / f 42 + 0 2 + 4 2 = /32 = 4/2 


CALCULUS REQUIRED 

EXAMPLE 10 An Inner Product on C[a, fo] 

Let f = f (x) and g = g(x) be two functions in C[a,b ] and define 

/(x)g(x) dx 


< r - g ) = f J 


( 12 ) 


We will show that this formula defines an inner product on C[a, b] by verifying the four inner product axioms 
for functions f = /(*), g = g(x), and h = h{x) in C[a, b ]: 

1. 


(f, g)= t /00g00 dx=tg(x)f(x) dx = 
J a Ja 


gJ 


which proves that Axiom 1 holds. 


(f + g, h} 


= t 


J a 


C/to+sOO)*00 dx 


- c 


J a 




f(x)k(x)dx + / g{x)h(x)dx 


Ja 


= (f,h}+(g,ll} 


which proves that Axiom 2 holds. 









3 . 


(£f, g} = / kf O)g(x) dx=k f / (x)g(x) dx = k 
Ja Ja 


f, g 


which proves that Axiom 3 holds. 

4 . Iff = / ( x ) is any function in C[a, &], then 



2 (x) dx > 0 


(13) 


'y 

since f (x) > 0 for all x in the interval [a, b ]. Moreover because/is continuous on [a, b], the equality 

holds in Formula 13 if and only if the function/is identically zero on [a, b ], that is, if and only if f = Q; an d 
this proves that Axiom 4 holds. 


CALCULUS REQUIRED 

EXAMPLE 11 Norm of a Vector in C[a, 5] 


If C[a, b] has the inner product that was defined in Example 10, then the norm of a function f = f (x) relative 
to this inner product is 


i=(r,r)1 ' 2= i(f 


f\x) dx 

and the unit sphere in this space consists of all functions f in C[a, b] that satisfy the equation 


(14) 


r 


f\x) dx = 1 


Note that the vector space P n is a subspace of C[a, b] because polynomials are continuous functions. Thus, 
Formula 12 defines an inner product on P n . 

Recall from calculus that the arc length of a curve y =f (x ) over an interval [ct 7 b] is given by the formula 




dx 


(15) 


Do not confuse this concept of arc length with ||f ||, which is the length (norm) of f when f is viewed as a vector in C[a, b]. 
Formulas 14 and 15 are quite different. 


Algebraic Properties of Inner Products 

The following theorem lists some of the algebraic properties of inner products that follow from the inner product axioms. This 
result is a generalization of Theorem 3.2.3, which applied only to the dot product on R n . 


THEOREM 6.1.2 


If u, v, and w are vectors in a real inner product space V, and if k is a scalar, then 






(a) (O,v} = (v,0} = 0 

(b) (u, V + w} = (u, vj + (u, wj 

(c) (U, v-w} = (u, v}-(u,w} 

(d) (« - V, w} = (u, w} - (v, w} 

(e) i(u, v} = (u, kv) 

We will prove part ( b ) and leave the proofs ofthe remaining parts as exercises. 

(u, v + wj =(v-fw, uj, [By symmetry] 

= (v, u} 4- ijw, u) [By additivity] 

= (u, v} 4- (u, wj [By symmetiy] 

The following example illustrates how Theorem 6.1.2 and the defining properties of inner products can be used to perform 
algebraic computations with inner products. As you read through the example, you will find it instructive to justify the steps. 

EXAMPLE 12 Calculating with Inner Products 

(u - 2v, 3u + 4vJ = (u, 3u 4- 4v} - (2v, 3u 4- 4v) 

= (u, 3u} + (u, 4v} - (2v, 3uj, - (2v, 4v} 

= 3{u, uj + 4{u, vj — 6(v, uj — 8{v, vj 

= 3||u|| 2 + 4(u, v} - 6(u, v} - 8||v|| 2 
= 3||u|| 2 -2(u,v}-8||v|| 2 


Concept Review 

Inner product axioms 
Euclidean inner product 
Euclidean ?z-space 
Weighted Euclidean inner product 
Unit circle (sphere) 

Matrix inner product 

Norm in an inner product space 

Distance between two vectors in an inner product space 
Examples of inner products 
Properties of inner products 

Skills 

Compute the inner product of two vectors. 

Find the norm of a vector. 

Find the distance between two vectors. 


Show that a given formula defines an inner product. 

Show that a given formula does not define an inner product by demonstrating that at least one of the inner product 
space axioms fails. 


Exercise Set 6.1 

1. Let (u, vj be the Euclidean inner product on and let u = (1, 1), v = (3, 2), w= (0, “1), and k = 3- Compute the 
following. 

(a) («.▼} 

(b) (*v.w) 

(c) (u + v,w} 

(d) IMI 

(e) d ( u > v ) 

(f) ll®-*v|l 

Answer: 

(a) 5 

(b) 

(c) -3 

(d) /l3 

(e) ft 

(f) {89 

2. Repeat Exercise 1 for the weighted Euclidean inner product (u, vj = 2u\v\ 4 3z^2 v 2- 

3. Let (u, vj be the Euclidean inner product on and let u = (3, — 2), v = (4, 5), w= ( — 1, 6), and k = — 4- Verify the 
following. 

(a) (u, v} = (v, u} 

(b) (u 4- v, w} = (u, w} + (v, wj 

(c) (u, V + w} = (u, v} + (u, w} 

(d) (An, vj = k(u, vj = (u, kv} 

(e) ( 0 , vj = (v, 0 } = 0 

Answer: 


(a) 2 

(b) 11 

(c) -13 

(d) -8 

(e) 0 


4. Repeat Exercise 3 for the weighted Euclidean inner product (u, vj = 4u\v\ + 5u2V2- 


' Let (u, v 1 ;, be the inner product on generated by 
following. 


2 1 
1 1 


, and let u = (2, 1), v = ( — 1, 1), w= (0, — 1). Compute the 




(a) («.▼} 

(b) {v,w} 

( C ) (u-Fv,w) 

(d) llvll 

(e) d O. w ) 

(f) ||v-w|| 2 


Answer: 

(a) -5 

(b) 1 

(c) -7 

(d) 1 

(e) 1 

(f> 1 

6 o 

* Repeat Exercise 5 for the inner product on R 1 generated by 
7. Compute (u, vj using the inner product in Example 6. 

( a )u = 


( b >u = 


2 -1 


[3 -2' 


-1 3 

00 

, V — 

1 1 

1 2' 


"4 6] 

-3 5 

, V — 

1 

O 

00 


Answer: 


(a) 3 

(b) 56 

8. Compute (p, q';. using the inner product in Example 7. 

(a) p = — 2 4- x 4= 3x z , q = 4 — lx A 

(b) p = — 5 4- 2x 4- x*, q = 3 + 2x — 4x A 

(a) Use Formula 4 to show that (u, v} = 9«iV[ 4- 4«2V2 is the inner product on generated by 

'3 O' 


A = 


0 2 


(b) Use the inner product in part (a) to compute (u, v} if u = ( — 3, 2) and v = (1,7). 
Answer: 

(b) 29 

I®* (a) Use Formula 4 to show that 


(u, v) = 5«ivi — u\V 2 — «2 V 1 + 1 0^2 v 2 


is the inner product on p} generated by 


A = 


2 1 
-1 3 



(b) Use the inner product in part (a) to compute (u, vj if u = (0, — 3) and v = (6, 2). 

11. Let u = («i, U 2 ) and v = (vj, V 2 ). In each part, the given expression is an inner product on g}. Find a matrix that 
generates it. 

(a) (u, v} = 3w 1 v 1 + 5u 2 v 2 

(b) («, vj = 4«ivi -+ 6u 2 v 2 

Answer: 


(a) 

fi o 


1 

o 

^1 

(b) 

"2 0 ' 


0 ^6 


12. Let P 2 have the inner product in Example 7. In each part, find ||p||. 

(a) p = — 2 + 3x + 2x l 

(b) P = 4-3x 2 


13. Let il /22 have the inner product in Example 6. In each part, find ||^4||. 




-2 5 

3 6 
0 0 
0 0 


Answer: 


(a) {lA 

(b) 0 


14. Let P 2 have the inner product in Example 7. Find d (p, q). 

p = 3 — x + x 2 , 

15. Let M 22 have the inner product in Example 6. Find d (A, B). 


(a) 

(b) 


A = 
A = 


2 6 
9 4 
-2 4 
1 0 


,B = 


,B = 


-4 7 
1 6_ 
"-5 1 
6 2 


q = 2 4- 5x 2 


Answer: 

(a) /l05 

(b) ^47 

16. Let P 2 have the inner product of Example 9, and letp = l+ 7: + ^ and q = 1 — 2x . Compute the following. 

(a) (p. q} 

(b) llpll 

(c) of(p, q) 

17. Let P 2 have the evaluation inner product at the sample points 

xq = - 1, *1 = 0, x 2 = 1, *3 = 2 



Find (p, q} and ||p || for p = x + x 3 and q = 1 + x 2 . 


Answer: 

(P> q} = 50, ||p|| = 6/3 

18. In each part, use the given inner product on R 2 to find ||w||, where w = ( — 1, 3). 

(a) the Euclidean inner product 

(b) the weighted Euclidean inner product (u, vj = 3u\v\ + 2^2 V 2 5 where u = (ti\, 112 ) and v = (vj, V 2 ) 

(c) the inner product generated by the matrix 



19. Use the inner products in Exercise 18 to find a?(u, v) for u = ( — 1, 2) and v = (2, 5). 

Answer: 

(a) 3/2 

(b) 3/5 

(c) 3/l3 

20. Suppose that u, v, and w are vectors such that 

(u, v} = 2, (v,w}= —3, (u, w} = 5 
Hull = n IM| = 2. INI =7 

Evaluate the given expression. 

(a) (u + v,v-fw} 

(b) (2v — w, 3u 4° 2wj, 

(u — v — 2w, 4u + vj 

(d) ii u + v ii 

(e) || 2w — v|| 

(f) ||u-2v-l 4w|| 

21. Sketch the unit circle in p} using the given inner product. 

(a) | u , vj = ^wivi +-^r« 2 V2 

(b) (u, v} = 2w 1 v 1 +« 2 v 2 
Answer: 


(a) 


4 


1 1 

X 

-2 

2 

-4 


(b) 








22. Find a weighted Euclidean inner product on for which the unit circle is the ellipse shown in the accompanying figure. 

i i- v 

1 

x 


3 


Figure Ex-22 

23. Let u = («i, w 2 ) and v = (vj, V 2 ). Show that the following are inner products on g} by verifying that the inner product 
axioms hold. 

(a) (u, v} = 3«ivi + 5 w 2 V2 

(b) (u, v)=4u\v\ + a 2 v l +W1V2-F4W2V2 


Answer: 


For V = 


0 

-1 


, then ^} = — 2 < 0, so Axiom 4 fails. 


24 . Let u = («i, U2, ^3) and v = (vj, v 2 , V3). Determine which of the following are inner products on g} . For those that are 
not, list the axioms that do not hold. 

(a) (u, v}=aivi +w 3 v 3 

(b) |u, vj = u%y\ 4 - u 2v| + W3V3 

(c) (u, v) = 2 u\v\ -h w 2 v 2 + 4W3V3 

(d) (u, v} = wivi-W2V2 + W3V3 


25. Show that the following identity holds for vectors in any inner product space. 

|| U + V || 2 + ||U — v || 2 = 2 || u || 2 + 2 || v || 2 


Answer: 


(a) 

15 

(b) 0 

26. Show that the following identity holds for vectors in any inner product space. 


27 • Let U = 


u 1 U2 
U 4 


and V = 


vi v 2 
v 3 v 4 


u - v )= 4ll u + v ll -^llu-vll" 


Show that lU, V} = u\v\ 4= u^y 3 4- W 3 V 2 4- 114V 4 is not an inner product on jlf 22 - 


28. Calculus required Let the vector space P 2 have the inner product 


p> q 


-L 


p(x)q(x) dx 














(a) Find ||p|| forp = 1, p = *, andp = x 2 . 

(b) Find d (p, q) if p = 1 and q = x. 

29. Calculus required Use the inner product 

f p(x)q{x) dx 

on P3, to compute (p, qj. 

(a) p = 1 — x 4- x* + 5x 3 , q = x — 3x 2 

(b) p = x — 5x 3 , q = 2 4- 8x 2 



30. Calculus required In each part, use the inner product 


f, g 


-/ 


/(x)g(x) rfx 


on C[0, 1 ] to compute {f, g}. 

( a ) f = cos27rx, g = sin27rx 

(b) f = x, g = e x 

(c) f = tan**, g = 1 


31. Prove that Formula 4 defines an inner product on R n . 

32. The definition of a complex vector space was given in the first margin note in Section 4.1. The definition of a complex 
inner product on a complex vector space V is identical to Definition 1 except that scalars are allowed to be complex 
numbers, and Axiom 1 is replaced by ju, v|i = (v, 11 }. The remaining axioms are unchanged. A complex vector space with 

a complex inner product is called a complex inner product space. Prove that if V is a complex inner product space then 
|u, u, vj. 


True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The dot product on R 2 is an example of a weighted inner product. 

Answer: 

True 

(b) The inner product of two vectors cannot be a negative real number. 

Answer: 

False 

(c) (u, v 4 wj = (v, 11 } 4- (w, uj. 

Answer: 

True 

(d) jiu, &vj = £ 2 |u, vj. 

Answer: 


True 






(e) If (u, vj = 0, then u = Q or v = 0- 
Answer: 

False 

(i)lf||v|| 2 = 0, then v = 0- 
Answer: 

True 

(g) If A is an « x n matrix, then (u, vj = An -Ay defines an inner product on R n . 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



6.2 Angle and Orthogonality in Inner Product 
Spaces 

In Section 3.2 we defined the notion of “angle” between vector in R n . In this section we will extend this idea 
to general vector spaces. This will enable us to extend the notion of orthogonality as well, thereby setting the 
groundwork for a variety of new applications. 


Cauchy-Schwarz Inequality 

Recall from Formula 20 of Section 3.2 that the angle Q between two vectors u and v in R n is 

9=C0S '‘(isiMr) <» 

We were assured that this formula was valid because it followed from the Cauchy-Schwarz inequality 
(Theorem 3.2.4) that 


- MINI < 2 > 

as required for the inverse cosine to be defined. The following generalization of Theorem 3.2.4 will enable us 
to define the angle between two vectors in any real inner product space. 


Cauchy-Schwarz Inequality 

If u and v are vectors in a real inner product space V, then 

|(u, v}|< |M|||v|| ( 3 ) 


We warn you in advance that the proof presented here depends on a clever trick that is not easy to 
motivate. 

In the case where u = 0 the two sides of 3 are equal since (u, v\ and j|u|| are both zero. Thus, we need only 
consider the case where u ^ 0- Making this assumption, let 

a = (u, u), b = 2{u, v}, c = (v, v} 

and let t be any real number. Since the positivity axiom states that the inner product of any vector with itself is 
nonnegative, it follows that 

0<^u + v, /u + v} = ju, u^ 2 + 2|u, vj^ + jv, vj 

= at 2 4- bt + c 




This inequality implies that the quadratic polynomial at} I bi |- c has either no real roots or a repeated real 
root. Therefore, its discriminant must satisfy the inequality b 1 — Aac < 0- Expressing the coefficients a , b , 

and c in terms of the vectors u and v gives — 4|u, u jj v, vj < 0 or, equivalently, 

(u, v} 2 < (u, Uj|V, v) 

Taking square roots of both sides and using the fact that (u, u\ and (v, v) are nonnegative yields 


^u, v 

which completes the proof 


< (u, u} 1/2 (v, orequivalently 


U, V 


< Hull llvll 


The following two alternative forms of the Cauchy-Schwarz inequality are useful to know: 

(u,v} 2 <{u,u 


(».V) 2 <||U|| 2 ||V || 2 

The first of these formulas was obtained in the proof of Theorem 6.2.1, and the second is a variation of the 
first. 


(4) 

(5) 


Angle Between Vectors 


Our next goal is to define what is meant by the “angle” between vectors in a real inner product space. As the 
first step, we leave it for you to use the Cauchy-Schwarz inequality to show that 




u v 


This being the case, there is a unique angle 0 in radian measure for which 

cos# = -n^rnrir- ^ O<0<tr 


u Vi 


(Figure 6.2.1). This enables us to define the angle 0 between u and v to be 


( 6 ) 


(7) 


( 8 ) 



Figure 6.2.1 











EXAMPLE 1 Cosine of an Angle Between Two Vectors in R 4 

Let have the Euclidean inner product. Find the cosine of the angle (t between the vectors 
u= (4, 3, 1, - 2) andv = ( - 2, 1, 2, 3). 

We leave it for you to verify that 

||u|| = /30, ||v|| = /l8, and ju, vj = - 9 

from which it follows that 

cosd= ( u > v ) _ _9_ _ _3_ 

IMIIMI /30/18 2/15 


Properties of Length and Distance in General Inner Product Spaces 

In Section 3.2 we used the dot product to extend the notions of length and distance to R”, and we showed that 
various familiar theorems remained valid (see Theorem 3.2.5, Theorem 3.2.6, and Theorem 3.2.7). By making 
only minor adjustments to the proofs of those theorems, we can show that they remain valid in any real inner 
product space. For example, here is the generalization of Theorem 3.2.5 (the triangle inequalities). 


THEOREM 6.2.2 

If u, v, and w are vectors in a real inner product space V, and if k is any scalar, then: 

(a) ||u 4- v|| < ||u|| + ||v|| [Triangle inequality for vectors] 

(b) d (u, v) < d (u, w) + d (w, v) [Triangle inequality for distances] 


Proof (a) 


l|u + v|| 2 = (u + v, u + vj 

= (u, u} + 2(u, v} + (v, vj 

<(u, u) + |(u, v}| + {v, vj [Property of absolute value] 

< (u, u} + 2||u|| ||v|| + (v, v} [By (3)] 

= |I»I | 2 + 2 |M|||v|| + || y || 2 
= (l|u|| + l|v||) 2 

Taking square roots gives ||u -f v|| < ||u|| + ||v||. 





Proof (b) Identical to the proof of part ( b ) of Theorem 3.2.5. 


Orthogonality 

Although Example 1 is a useful mathematical exercise, there is only an occasional need to compute angles in 
vector spaces other than p} and p-‘. A problem of more interest in general vector spaces is ascertaining 
whether the angle between vectors is x / 2- You should be able to see from Formula 8 that if u and v are 
nonzero vectors, then the angle between them is Q = tj- / 2 if and only if (u, v';, = 0. Accordingly, we make the 
following definition (which is applicable even if one or both of the vectors is zero). 


DEFINITION 1 

Two vectors u and v in an inner product space are called orthogonal if (u, vj = 0. 


As the following example shows, orthogonality depends on the inner product in the sense that for different 
inner products two vectors can be orthogonal with respect to one but not the other. 

EXAMPLE 2 Orthogonality Depends on the Inner Product 

The vectors u = (1, 1) and v = (1, — 1) are orthogonal with respect to the Euclidean inner 
product on p}, since 

u-v=(l)(l) + (l)(-l) = 0 

However, they are not orthogonal with respect to the weighted Euclidean inner product 
(u, vj = 3«ivi + 2ti2V2, since 

(u, vj = 3(1) (1) + 2(1) (— 1) = 1 * 0 


EXAMPLE 3 Orthogonal Vectors in M 22 


If M 22 has the inner product of Example 6 in the preceding section, then the matrices 


U = 


1 0 
1 1 


and 


V = 


0 2 
0 0 


are orthogonal, since 

{U,V) = 1 (0) + 0(2) + 1 (0) + 1 (0) = 0 


CALCULUS REQUIRED 






EXAMPLE 4 Orthogonal Vectors in P2 


Let P 2 have the inner product 


and let p = x and q = x . Then 


p, q 


/: 


P(x)q(x) 


dx 



Because {P' *)= 0 ’ the vectors p = * and q = are orthogonal relative to the given inner 
product. 


In Section 3.3 we proved the Theorem of Pythagoras for vectors in Euclidean n-space. The following theorem 
extends this result to vectors in any real inner product space. 

Generalized Theorem of Pythagoras 

If u and v are orthogonal vectors in an inner product space, then 

||u + v|| 2 =||u|| 2 +||v|| 2 

The orthogonality of u and v implies that (u, vj = 0, so 

||u + v|| 2 = (u + V, u + v} = ||u|| 2 + 2(u, v} + ||v|| 2 

= IN 2 + INI 2 

CALCULUS REQUIRED 


EXAMPLE 5 Theorem of Pythagoras in P2 











In Example 4 we showed that p = x and q = x“ are orthogonal with respect to the inner product 

p(x)q(x) dx 
J-1 

on P 2 . It follows from Theorem 6.2.3 that 




Up +qll 2 = llpll 2 + "~" 2 

Thus, from the computations in Example 4, we have 

>2 / ,2 


" P +< >" 2 =(/D + (/D=f + f=# 

We can check this result by direct integration: 

llp + qll 2 = (p + q.p + q} = J (x + x 2 )(x + x 2 )<afr 

= J x 2 dx + 2j x 3 dx+J x 4 dx = -| + 0 4=-| = -j|- 


Orthogonal Complements 

In Section 4.8 we defined the notion of an orthogonal complement for subspaces of R n , and we used that 
definition to establish a geometric link between the fundamental spaces of a matrix. The following definition 
extends that idea to general inner product spaces. 


DEFINITION 2 

If IE is a subspace of an inner product space V, then the set of all vectors in V that are orthogonal to 
every vector in W is called the orthogonal complement of W and is denoted by the symbol W 1 • 


J 


In Theorem 4.8.8 we stated three properties of orthogonal complements in R n . The following theorem 
generalizes parts (a) and (b) of that theorem to general inner product spaces. 


THEOREM 6.2.4 


If IE is a subspace of an inner product space V, then: 

(a) IV ± is a subspace of V. 

(b) WnW ± = {0} . 


The set W 1 contains at least the zero vector, since (0, wj = 0 for every vector w in W. Thus, it 
remains to show that W 1 is closed under addition and scalar multiplication. To do this, suppose that u and v 
are vectors in W 1 , so that for every vector w in W we have (u, w\ = 0 and ( v, wj = 0. It follows from the 
additivity and homogeneity axioms of inner products that 

(u + v, wj = (u, wj + (v, wj = 0 + 0 = 0 
wj =k(u, wj = £(0) = 0 
which proves that u | v and are in W 1 • 

Proof (b) If v is any vector in both W and W 1 , then v is orthogonal to itself; that is, fv, vj = 0. It follows 
from the positivity axiom for inner products that v = 0- 

The next theorem, which we state without proof, generalizes part (c) of Theorem 4.8.8. Note, however, that 
this theorem applies only to finite-dimensional inner product spaces, whereas Theorem 6.2.5 does not have 
this restriction. 


THEOREM 6.2.5 


Theorem 6.2.5 implies that in a finite¬ 
dimensional inner product space 
orthogonal complements occur in pairs, 
each being orthogonal to the other (Figure 
6 . 2 . 2 ). 

Theorem 6.2.5 If IF is a subspace of a finite-dimensional inner product space V, then the orthogonal 
complement of W 1 is W; that is, 

(iv ± } J - = w 


IV- 1 - 


w 


I 

I 

I 

I 


Each vector in W is orthogonal to each vector in W 


-1 


and conversely 


Figure 6.2.2 




In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the row space and null space 
of a matrix are orthogonal complements with respect to the Euclidean inner product on R n (Theorem 4.8.9). 
The following example takes advantage of that fact. 

EXAMPLE 6 Basis for an Orthogonal Complement 

Let W be the subspace of /? 6 spanned by the vectors 

wi =(1,3,- 2, 0, 2, 0), W2 = (2, 6, - 5, - 2,4, - 3), 

w 3 = (0, 0, 5,10, 0, 15), w 4 = (2, 6, 0, 8,4, 18) 

Find a basis for the orthogonal complement of W. 


The space W is the same as the row space of the matrix 

'1 3 -2 0 2 O' 

2 6 -5 -2 4 -3 

0 0 5 10 0 15 

2 6 0 8 4 18 


Since the row space and null space of A are orthogonal complements, our problem reduces to 
finding a basis for the null space of this matrix. In Example 4 of Section 4.7 we showed that 


'- 3 ' 


-4" 


’-2' 

1 


0 


0 

0 


-2 


0 

0 

. V 2 = 

1 

> V 3 = 

0 

0 


0 


1 

0 


0 


0 


form a basis for this null space. Expressing these vectors in comma-delimited form (to match 
that of wi, W2, W3, and W4), we obtain the basis vectors 

vi = (-3,1, 0,0, 0,0), v 2 = (-4.0. -2.1,0,0), v 3 = (-2, 0, 0. 0,1. 0) 

You may want to check that these vectors are orthogonal to wj, w 2 , w 3 , and W 4 by computing 
the necessary dot products. 


Concept Review 

Cauchy-Schwarz inequality 
Angle between vectors 
Orthogonal vectors 
Orthogonal complement 


Skills 










Find the angle between two vectors in an inner product space. 

Determine whether two vectors in an inner product space are orthogonal. 

Find a basis for the orthogonal complement of a subspace of an inner product space. 


Exercise Set 6.2 

1. Let p}, p}, and p 4 have the Euclidean inner product. In each part, find the cosine of the angle between u 
and v. 

(a) u=(l, -3), v= (2, 4) 

(b) *=(-1,0), v= (3, 8) 

(c) u = (— 1, 5, 2), v= (2,4, -9) 

(d) u=(4 t 1.8). v= (1, 0, -3) 

(e) <i=(l,0, 1,0), v = (— 3, -3, -3, -3) 

(f) u = (2,1,7, -1), v= (4, 0,0,0) 


Answer: 


(a) -L 

l/2 

(b) 


l/73 


(c) 0 


(d) 

(e) 

(f) 



2. Let Pi have the inner product in Example 7 of Section 6.1 . Find the cosine of the angle between pand q. 

(a) p = — 1 + 5x + 2x 2 , q = 2 + Ax — 9x 2 

(b) p = x — x 2 , q = 7 + 3x + 3x 2 


3. Let M 22 have the inner product in Example 6 of Section 6.1 . Find the cosine of the angle between A and 

B. 


(a) 

A = 

'2 6" 

_1 — 3_ 

,B = 

"3 2" 

_1 0 . 

(b) 

A = 

1-1 

I 

—*• to 

1 _ 1 

,B = 

'-3 1 
4 2 














Answer: 


(a) « 

10/7 

(b) 0 

4. In each part, determine whether the given vectors are orthogonal withrespect to the Euclidean inner 
product. 

(a) “=(“1.3,2), v = (4, 2, -1) 

(b) n=(-2. -2, -2), v= (1,1,1) 

(c) u= («i, U2, U3), v= (0,0,0) 

(d ) u=( —4,6, -10,1), v= (2, 1, -2,9) 

(e) u= (0, 3, -2,1), v = (5, 2, -1,0) 

(f) u= (a, b), v = ( b, a) 

'l 'l 

5. Show that p = l— x A-2x and q = 2x 4 - x“ are orthogonal with respect to the inner product in Exercise 
2 . 

6 . Let 



Which of the following matrices are orthogonal to A with respect to the inner product in Exercise 3? 


(a) 

-3 0 


0 2 

(b) 

'1 1 


.0 _1 

(c) 

"0 O' 


.0 0_ 

(d) 

'2 f 


5 2 


7. Do there exist scalars k and l such that the vectors u=(2, k, 6 ), v = (i, 5,3), and w = (1, 2, 3) are 
mutually orthogonal with respect to the Euclidean inner product? 

Answer: 

No 

8. Let R-' have the Euclidean inner product, and suppose that u = (1, 1, —1) and v = (6, 7, — 15). Find a 
value of k for which ||&u | v|| = 13. 

9. Let R-' have the Euclidean inner product. For which values of k are u and v orthogonal? 

(a) u = (2, 1, 3), v= (1, 7, k) 

(b) n=(*,*,l). v=(*,5,6) 



Answer: 


(a) k = - 3 

(b) * = - 2, - 3 

10. Let £ 4 have the Euclidean inner product. Find two unit vectors that are orthogonal to all three of the 
vectorsu= (2, 1, — 4, 0),v=( — 1, — 1, 2, 2), and w= (3, 2, 5, 4). 

11. In each part, verify that the Cauchy-Schwarz inequality holds for the given vectors using the Euclidean 
inner product. 

(a) u= (3, 2), v = (4, — 1) 

(b) u=(-3.1.0). v= (2, -1,3) 

(c) u=(-4.2.1). v= (8, -4, -2) 

(d) u = (0, -2, 2, 1), v = (- 1, - 1, 1, 1) 

12. In each part, verify that the Cauchy-Schwarz inequality holds for the given vectors. 

(a) u = ( — 2, 1) and v = (1, 0) using the inner product of Example 1 of Section 6.1 . 

using the inner product in Example 6 of Section 6.1 . 

(c) p = — 1 4- 2x + x and q = 2 — 4x using the inner product given in Example 7 of Section 6.1 . 

13. Let £ 4 have the Euclidean inner product, and let u = ( — 1, 1, 0, 2). Determine whether the vector u is 
orthogonal to the subspace spanned by the vectors wq = (0, 0, 0, 0), W 2 = (1, — 1, 3, 0), and 

w 3 = (4, 0, 9, 2). 

Answer: 

No 

In Exercises 14-15, assume that R n has the Euclidean inner product. 

14. Let Wbe the line in r} with equation y = 2x- Find an equation for W 1 • 

(a) Let W be the plane in R^ w ith equation x — 2y — 3z = 0- Find parametric equations for W 1 ■ 

(b) Let Wbe the line in R- with parametric equations 

x = 2t, y = — 5t, z = 4t 

Find an equation for W 1 • 

(c) Let W be the intersection of the two planes 

x+y+z = 0 and x — y + z = 0 

in R Find an equation for W 1 • 

Answer: 



(a) x = t, y = — 2t, z = — 3t 



(b) 2x —5y + 4z = 0 

(c) *-2 = 0 

16. Find a basis for the orthogonal complement of the subspace of R n spanned by the vectors. 

(a) vi = (l, — 1, 3), V 2 = (5, -4, - 4 ),V 3 = (7, -6,2) 

(b) vi = (2, 0, - 1), V 2 = (4, 0, - 2) 

(c) vi = (1,4, 5, 2), V 2 = (2, 1, 3, 0), V 3 = (- 1, 3, 2, 2) 

(d) vi = (1,4, 5, 6 , 9 ),V 2 = (3, -2, 1,4, - 1 ),V 3 = ( - 1, 0, - 1, -2, - l),v 4 = (2, 3, 5,7, 8 ) 

17. Let Vbe an inner product space. Show that if u and v are orthogonal unit vectors in V, then ||u — v|| = ^2 

18. Let Vbe an inner product space. Show that if w is orthogonal to both and U 2 , then it is orthogonal to 
Arjui 4 - A: 2 u 2 f° r a ll scalars fcq and kj- Interpret this result geometrically in the case where V is R 2 with 
the Euclidean inner product. 

19. Let Vbe an inner product space. Show that if w is orthogonal to each of the vectors ui, U 2 ,..., u r , then it 

is orthogonal to every vector in span {uj, U 2 .u r ) . 

20. Let {vi, V 2 ,..., v r ) be a basis for an inner product space V. Show that the zero vector is the only vector 
in V that is orthogonal to all of the basis vectors. 

21. Let {wi, W 2 ,..., Wfc) be a basis for a subspace W of V. Show that W 1 consists of all vectors in V that are 
orthogonal to every basis vector. 

22. Prove the following generalization of Theorem 6.2.3: If vj, V 2 ,..., v r are pairwise orthogonal vectors in 
an inner product space V, then 

||v!+v 2 + • • • +v r || 2 =||v 1 || 2 + ||v 2 || 2 + • • • +K|| 2 

23. Prove: If u and v are « x 1 matrices and A is an nxn matrix, then 

[v T A T A*f< (u r ^^u)(v^ r ^y) 

24. Use the Cauchy-Schwarz inequality to prove that for all real values of a, b, and &, 

(acosO + bsm0) 2 < a 2 -F b 2 

25. Prove: Ifvt>i, w> 2 ,..., are positive real numbers, and if u = («i, U2 ,.... u n ) and v = (vj, V 2 ,v„) 
are any two vectors in R”, then 

|wi«ivi+ m? 2 U 2 v 2 + ' ’ 

( 2 2 2 2 2\^^ 2 
< (vviUj -FW2«2 + ' • ' +Vt > n ii n \ fwiVj +W2V 2 + • • • +w M v„ 1 

26. Show that equality holds in the Cauchy-Schwarz inequality if and only if u and v are linearly dependent. 

27. Use vector methods to prove that a triangle that is inscribed in a circle so that it has a diameter for a side 
must be a right triangle. [Hint: Express the vectors and gQ in the accompanying figure in terms of u 
andv.] 



R 


Figure Ex-27 

28. As illustrated in the accompanying figure, the vectors u = (l, {2 ) and v = ^- 1 , \[3 j have norm 2 and 

an angle of 60° between them relative to the Euclidean inner product. Find a weighted Euclidean inner 
product with respect to which u and v are orthogonal unit vectors. 


(-1W3) 


(i.V5) 


60 ' 


Figure Ex-28 


29. Calculus required Let / (x) and g(x) be continuous functions on [0, 1 ]. Prove: 

-i2 


(a) 


i: 


/00«00 dx 


f f 2 (x)dx f g 2 (x)dx 

Jo Jo 


(b) 

' rl 0 

1/2 

/•l - 

1/2 

/■i 0 


/ [f(x)+g(x)] 2 dx 

Jo 

< 

/ f 2 (x)dx 

Jo 

+ 

/ g 2 (x)dx 

Jo 


1/2 


[Hint: Use the Cauchy-Schwarz inequality.] 

30. Calculus required Let C[0, k] have the inner product 


H L 


f(x)g(x) dx 


and let f n = cosnx (n = 0, 1,2,...). Show that if ^ /, then f ^ and f j are orthogonal vectors. 

(a) Let W be the line y = x in an xy-coordinate system in p/. Describe the subspace W 1 . 

(b) Let If be they-axis in an xyz-coordinate system in R-'. Describe the subspace W 1 . 

(c) Let W be the yz-plane of an xyz-coordinate system in [> . Describe the subspace W 1 • 


Answer: 

(a) The line y = — x 

(b) The xz-plane 

(c) The x-axis 

32. Prove that Formula 4 holds for all nonzero vectors u and v in an inner product space V. 
















True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) If u is orthogonal to every vector of a subspace W, then u = 0- 
Answer: 

False 

(b) If u is a vector in both W and W 1 , then u = 0- 
Answer: 

True 

(c) If u and v are vectors in W 1 , then u | v is in W 1 • 

Answer: 

True 

(d) If u is a vector in W 1 and A: is a real number, then An is in W 1 • 

Answer: 

True 

(e) If u and v are orthogonal, then |(u, vj| = ||u| ||v||. 

Answer: 

False 

(I) If u and v are orthogonal, then ||u 4- v|| = ||u|| + ||v||. 

Answer: 

False 
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6.3 Gram-Schmidt Process; Q/?-Decomposition 

In many problems involving vector spaces, the problem solver is free to choose any basis for the vector space that 
seems appropriate. In inner product spaces, the solution of a problem is often greatly simplified by choosing a basis 
in which the vectors are orthogonal to one another. In this section we will show how such bases can be obtained. 


Orthogonal and Orthonormal Sets 

Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal if their inner product is 
zero. The following definition extends the notion of orthogonality to sets of vectors in an inner product space. 


DEFINITION 1 

A set of two or more vectors in a real inner product space is said to be orthogonal if all pairs of distinct 
vectors in the set are orthogonal. An orthogonal set in which each vector has norm 1 is said to be 
orthogonal. 


J 


EXAMPLE 1 An Orthogonal Set in R 3 

Let 

11 = (0.1.0). *2 = 0.0.1). u 3 = (1, 0, — 1) 

and assume that pf has the Euclidean inner product. It follows that the set of vectors 
S= (uj, U 2 , u 3 ) is orthogonal since (uj, U 2 } = (ui, u 3 J = (U 2 , u 3 J = 0. 


If v is a nonzero vector in an inner product space, then it follows from Theorem 6.1.16 with k = ||v|| that 


1 


l|v|| 


■▼11 = 


1 


IMI = 


1 


IMI 


IMI = 1 


from which we see that multiplying a nonzero vector by the reciprocal of its norm produces a vector of norm 1. This 
process is called normalizing v. It follows that any orthogonal set of nonzero vectors can be converted to an 
orthonormal set by normalizing each of its vectors. 


EXAMPLE 2 Constructing an Orthonormal Set 


The Euclidean norms of the vectors in Example 1 are 

IM = 1, l|u 2 || = { 2 , ||u 3 || = \[2 

Consequently, normalizing uj, 113 , and 113 yields 







-_HJ_ 


u 2 


V1 = -M =<0 ’’- 0) ' V2 = IIu 2 || 


+.0,+ 


{2 ’ {2 


V3 = 


=- S3 - = [ 4 =,o, -~\=' 


w ’ f2 t 


We leave it for you to verify that the set S = { vj, v 2 , V 3 } is orthonormal by showing that 
(VI, v 2 } = (vi, v 3 } = (v 2 , v 3 } = 0 and ||vi|| = ||v 2 || = ||v 3 || = 1 


In any two nonzero perpendicular vectors are linearly independent because neither is a scalar multiple of the 
other; and in any three nonzero mutually perpendicular vectors are linearly independent because no one lies in 
the plane of the other two (and hence is not expressible as a linear combination of the other two). The following 
theorem generalizes these observations. 


THEOREM 6.3.1 

If S = { vi , v 2 , ..} is an orthogonal set of nonzero vectors in an inner product space, then S is linearly 
independent. 


Assume that 


*T v l+* 2 v 2 + ■ • • +£„v„ = 0 (1) 

To demonstrate that S = {vi, v 2 ,..\ n } is linearly independent, we must prove that k\ = & 2 =... = k n = 0. 

For each v 2 in S , it follows from 1 that 

(zfclVl +* 2 V 2 + • ■ • +* M V„,Vj} = (0,Vi} = 0 

or, equivalently, 

fcl(vi, Vj} + * 2 (v 2 , Vj}+ • • • +* M (V„, v 2 } = 0 
From the orthogonality of S it follows that (v^-, v 2 } = 0 when j so this equation reduces to 

k >{ v i’ v i) = ° 

Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom for inner products that 
(Vj, Vj} * 0. Thus, the preceding equation implies that each kj in Equation 1 is zero, which is what we wanted to 
prove. 


Since an orthonormal set is orthogonal, and since 
its vectors are nonzero (norm 1), it follows from 
Theorem 6.3.1 that every orthonormal set is 
linearly independent. 


In an inner product space, a basis consisting of orthonormal vectors is called an orthonormal basis , and a basis 







consisting of orthogonal vectors is called an orthogonal basis. A familiar example of an orthonormal basis is the 
standard basis for R n with the Euclidean inner product: 

ei = (1, 0, 0,0), e 2 = (0, 1, 0,0),.... e„ = (0, 0, 0,1) 

EXAMPLE 3 An Orthonormal Basis 

In Example 2 we showed that the vectors 

VI = (0,1,0), v 2 = (- 4 =, 0, and v 3 = (-J=, 0, - 

1/2 1/2 / 2 ) 

form an orthonormal set with respect to the Euclidean inner product on R-'. By Theorem 6.3.1, these 
vectors form a linearlyindependent set, and since R [is three-dimensional, it follows from Theorem 
4.5.4 that S = {vj, v 2 , V3} is an orthonormal basis for R-'. 


Coordinates Relative to Orthonormal Bases 


One way to express a vector u as a linear combination of basis vectors 

S= (vi, v 2 ,..., v„} 


is to convert the vector equation 

u = c\v\ 4- c 2 v 2 4 s • • * 

to a linear system and solve for the coefficients c \, c 2 , c n . However, if the basis happens to be orthogonal or 
orthonormal, then the following theorem shows that the coefficients can be obtained more simply by computing 
appropriate inner products. 


THEOREM 6.3.2 


(a) If S = {vi, v 2 ,..v„} is an orthogonal basis for an inner product space V, and if u is any vector in V, 
then 


u = 


( u ’ v 0 

llvill 2 


VI + 


(U. Y2) 
l|v 2 || 2 


V2+ • • 


f«. v n) 

- 2 V: 

v„|| 2 


( 2 ) 


(b) If S= (vi, V2.v w } is an orthonormal basis for an inner product space V, and if u is any vector in V, 

then 


u = (u, vi}vi +(u, V2}V2+ • • • +(u,v„}v M 


( 3 ) 


Since S = (vi, V2 ,v M } is a basis for V, every vector u in V can be expressed in the form 


u = civi+C 2V2+ • • • +c„v„ 






We will complete the proof by showing that 


c _ ( U ’ V Q 

IIV; II 2 

for i = 1, 2,To do this, observe first that 

(U,V, } =(C1V1+C 2 V 2 + • • • + C„V„,Vf} 

= ci(vi, v, J + C2(V2, v, J+ • • • + c„(v„,v,} 

Since S is an orthogonal set, all of the inner products in the last equality are zero except the rth, so we have 

2 


(4) 


(U, VjJ =Ci^V 1? v,| =c,||v, 

Solving this equation for Cj yields 4, which completes the proof. 

Proof (b) In this case, Iklll = ||v 2 || =...= ||v„|| = 1, so Formula 2 simplifies to Formula 3. 


Using the terminology and notation from Definition 2 of Section 4.4, it follows from Theorem 6.3.2 that the 
coordinate vector of a vector u in V relative to an orthogonal basis S = (vi, V 2 ,v„} is 


(*)s= 


(tt. v l) (”. v 2) (tt. Vw) 


llvill 2 ’ ||V 2 || 2 ’' 


IKir 


and relative to an orthonormal basis S = {vi, V 2 ,..} is 

(u),sr= ((u, vi}, (u, v 2 }.(u, v„}) 


(5) 


( 6 ) 


EXAMPLE 4 A Coordinate Vector Relative to an Orthonormal Basis 

Let 

vi = (0,1,0), v 2 =|-y,0, |J, v 3 =||, 0, 

It is easy to check that S = { vi, V 2 , V 3 } is an orthonormal basis for R-' with the Euclidean inner product. 
Express the vector u=(l,l,l)asa linear combination of the vectors in 5, and find the coordinate vector 

( u )s- 

We leave it for you to verify that 

u,vij = l, |u, v 2 J= - J, and |u,V 3 j = ^ 


Therefore, by Theorem 6.3.2 we have 


that is, 


1 7 

u = V! --v 2 + yv 3 


0,1, i)=(o, 1, o) - 0,!)+0, 

Thus, the coordinate vector of u relative to S is 

(u)^= ({ 11 , VI }, (u, v 2 }, (u, v 3 }) = |l, - J, 







EXAMPLE 5 An Orthonormal Basis from an Orthogonal Basis 


(a) Show that the vectors 

wi = (0, 2, 0), w 2 = (3, 0, 3), w 3 = ( — 4, 0, 4) 
form an orthogonal basis for £- : with the Euclidean inner product, and use that basis to find an 
orthonormal basis by normalizing each vector. 

Express the vector u = (1, 2, 4) as a linear combination of the orthonormal basis vectors obtained 
in part (a). 


Solution 

The given vectors form an orthogonal set since 

(wi,W 2 } = 0, (wi,W 3 } = 0, (W 2 ,W 3 } = 0 

It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form a basis 
for by Theorem 4.5.4. We leave it for you to calculate the norms of w\, w 3 , and w 3 and then 
obtain the orthonormal basis 


V 1 = |. W * .| = (0, 1, 0), v 3 = - m W 2 m - = 

llwill ^ 1 l|w 2 || 


-ko.+l 


ft ' /2, 


w 3 

V3_ l|w 3 || - 


_J_ o -L 

f2 f2. 


It follows from Formula 3 that 

u = (u, vi }vi + (u, V 2 }V 2 + (u, V 3 JV 3 
We leave it for you to confirm that 

(u.V!} =(1,2. 4) -(0,1.0) = 2 


( u,v 2 > 


(u,v 3 } = (1,2,4)- 




\ 


h) & 


and hence that 


(1, 2,4) = 2(0, 1, 0) + -j= (-)=, 0, -J=] + 4= (- -k 0, -}= ] 

\2\y2 y2 J 2 \ ^2 y2 J 


Orthogonal Projections 

Many applied problems are best solved by working with orthogonal or orthonormal basis vectors. Such bases are 
typically found by starting with some simple basis (say a standard basis) and then converting that basis into an 













orthogonal or orthonormal basis. To explain exactly how that is done will require some preliminary ideas about 
orthogonal projections. 

In Section 3.3 we proved a result called the Prohection Theorem (see Theorem 3.3.2) which dealt with the problem 
of decomposing a vector u in R n into a sum of two terms, and W 2 , in which is the orthogonal projection of u 
on some nonzero vector a and W 2 is orthogonal to wq (Figure 3.3.2). That result is a special case of the following 
more general theorem. 


Projection Theorem 

If IF is a finite-dimensional subspace of an inner product space F,then every vector u in V can be expressed 
in exactly oneway as 

u=W! (7) 

where is in W and W 2 is in W 1 • 


The vectors «T and W 2 in Formula 7 are commonly denoted by 

w\ = projftr u and w 2 = proj^x u (8) 

They are called the orthogonal projection of non W and the orthogonal projection of n on W 1 , respectively. The 
vector W2 is also called the component of u orthogonal to W. Using the notation in 8, Formula 7 can be expressed 
as 


u = projjp u + proj^r i u 

(Figure 6.3.1). Moreover, since proj^xu = u — projj^u, we can also express Formula 9 as 

u = projfl/ u + (u - projft/ u) 


(9) 


( 10 ) 


Wi¬ 


ll 

proj lv i (i 


0 


proj^u 


r 


vv r 


Figure 6.3.1 


The following theorem provides formulas for calculating orthogonal projections. 


THEOREM 6.3.4 


Let W be a finite-dimensional subspace of an inner product space V. 

(a) If {vi, v 2 ,..v r } is an orthogonal basis for W, and u is any vector in V, then 


projjp u = 


_ (u.vi) „ , (u,v 2 ) 


VI 


-V 2 


( U ’ y r) 

•7 v r 

Kll 2 


iiviir iiv 2 ii 

(b) If {vi, v 2 ,..v r } is an orthonormal basis for W, and u is any vector in V, then 

projf f ru = (u, vi}vi + (u, v 2 Jv 2 + • • • +fu,v r )v r 


( 11 ) 


( 12 ) 


It follows from Theorem 6.3.3 that the vector u can be expressed in the form u = wj + w 2 , where 
wj = proj jp u is in W and w 2 is in W 1 ; and it follows from Theorem 6.3.2 that the component projj^ u = wj can be 
expressed in terms of the basis vectors for W as 


fwi.vi) , (wi,y 2 ) . . (wi,v r ) 

frU = wi= J --7 L V 2 + • • • +- 1 - ^-v r 


pr°J w - , . 

Ilvill l|v 2 |r l|v,| 

Since w 2 is orthogonal to W, it follows that 

(W 2 , Vl } = (W 2 . V 2 } =... = (W 2 , V r ) = 0 

so we can rewrite 13 as 


projjpu = wi 
or, equivalently, as 


(wi+w 2 ,vi) fwi+w 2 , v 2 ) , , fwi+w 2 ,v r ) 

u = wi = -*-o- vi + - 1 - — - l v 2 4- • • • + -*-~ L 

Ilvill 2 l|v 2 || 2 l|v,|| 2 


(U.V 1 ) , (u,v 2 ) , , (u,v,) 

= w ’ = J - ‘-Vl + J -^-v 2 + ■ • • + - 1 -^-v. 


proj ff ru = wi= Wl . 

iiviii 2 iiv 2 n 2 iiv.r 

In this case, Ilvill = ||v 2 ||=...= ||v r || = l , so Formula 13 simplifies to Formula 12. 


(13) 


EXAMPLE 6 Calculating Projections 

Let R-' have the Euclidean inner product, and let W be the subspace spanned by the orthonormal 
vectors vi = (0, 1, 0) and v 2 = | 0, ). From Formula 12 the orthogonal projection of 

u = (1, 1, 1) on IFis 

projjpu =(u, v^vi + (u, v 2 }v 2 

= (i)(o,i,o) + (-ij(-|, o,|) 


125’ ’ 25 j 


The component of u orthogonal to W is 














projfp ‘ u = u —projfj?u= (1, 1, 1) - 1, - ^j = ||j, 0, ||j 

Observe that proj^x u is orthogonal to both vi and V 2 , so this vector is orthogonal to each vector in 
the space W spanned by v\ and V2, as it should be. 


A Geometric Interpretation of Orthogonal Projections 


If IT is a one-dimensional subspace of an inner product space V, say span {a) , then Formula 11 has only the one 
term 

( u> a ) 

proj K /U = J -r-a 

l|a || 2 

In the special case where V is with the Euclidean inner product, this is exactly Formula 10 of Section 3.3 for the 
orthogonal projection of u along a. This suggests that we can think of 11 as the sum of orthogonal projections on 
“axes” determined by the basis vectors for the subspace W (Figure 6.3.2). 
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Figure 6.3.2 


The Gram-Schmidt Process 

We have seen that orthonormal bases exhibit a variety of useful properties. Our next theorem, which is the main 
result in this section, shows that every nonzero finite-dimensional vector space has an orthonormal basis. The proof 
of this result is extremely important, since it provides an algorithm, or method, for converting an arbitrary basis into 
an orthonormal basis. 


THEOREM 6.3.5 

Every nonzero finite-dimensional inner product space has an orthonormal basis. 


Let W be any nonzero finite-dimensional subspace of an inner product space, and suppose that 
{ui, U 2 , u r ) is any basis for W. It suffices to show that W has an orthogonal basis, since the vectors in that basis 
can be normalized to obtain an orthonormal basis. The following sequence of steps will produce an orthogonal basis 
{vi, V2..... v r } for W\ 




Let vi =ui. 


As illustrated in Figure 6.3.3, we can obtain a vector V2 that is orthogonal to by computing the 
component of U 2 that is orthogonal to the space W\ spanned by vi. Using Formula 11 to perform this 
computation we obtain 

(U 2 . v 0 

v 2 — u 2 “ projjp, U 2 = U 2 - 

llvill 2 


Of course, if V2 = 0 , then V2 is not a basis vector. But this cannot happen, since it would then follow from 
the above formula for V 2 that 


*2 = 


(» 2 , vi) 

llvill 2 


VI = 


f u 2 > v l) 

J -, Ul 

lluill 2 


which implies that U 2 is a multiple of contradicting the linear independence of the basis 
S= {ui,u 2 ,—,u„) . 


v 2 = u 2 - |>roj H , u 2 




V 1 P r °J ^ u 2 

Figure 6.3.3 

To construct a vector V3 that is orthogonal to both vi and V2, we compute the component of 113 orthogonal 
to the space Wj spanned by vi and V2 (Figure 6.3.4). Using Formula 11 to perform this computation we 
obtain 


v 3 = U3 — projjp 2 U3 = U3 


= ni ( U 3 ’ V 0 Vi ( U 3- v 2) 


llvill' 


iiv 2 ir 


-v 2 


As in Step 2 , the linear independence of {uj, U2,u M ) ensures that V3 0 . We leave the details for you. 


v 3 = u 3 - proiwr u, 

\ 




proj^Uj 


Figure 6.3.4 


To determine a vector V 4 that is orthogonal to vi, V 2 , and V 3 , we compute the component of U 4 orthogonal 
to the space W 3 spanned by v^, V2, and V3. From 11, 


V4 = U4 — proj^ 3 U4 = U4 


_ ( U 4 ’ V 0 V1 _ ( U 4 ’ V 2 L , _ ( U4 ' V3 ) 


iiviir 


iiv 2 ir 


V 2 


iiv 3 ir 


-v 3 


Continuing in this way we will produce an orthogonal set of vectors {vi, v 2 ,v r } after r steps. Since orthogonal 
sets are linearly independent, this set will be an orthogonal basis for the r-dimensional space W. By normalizing 
these basis vectors we can obtain an orthonormal basis. 













The step-by-step construction of an orthogonal (or orthonormal) basis given in the foregoing proof is called the 
Gram-Schmidtprocess. For reference, we provide the following summary of the steps. 


The Gram-Schmidt Process 


To convert a basis {uj, 112 ,.. 
computations: 

Step 1. y l =u l 


Step 2. 
Step 3. 


v 2 — u 2 “ 


v 3 — u 3 “ 


f»2. V Q 

llvill 2 
f”3. v l) 

llvill 2 


Step 4. ( u 4> V 1) 

v 4 = u 4 - -*-, 

llvill 2 


u r ) into an orthogonal basis {vj,V 2 ,..., 


vi 


vi 


•VI 


(”3. v 2 ) 

l|v 2 || 2 

(U4. v 2 ) 

l|v 2 || 2 


V 2 


■V 2 


(U 4 , V 3 ) 

J -9 v 3 

l|v 3 || 2 


v r ) , perform the following 


(continue for r steps) 

Optional Step. To convert the orthogonal basis into an orthonormal basis {qi, q2> *ir) > normalize the 
orthogonal basis vectors. 

L J 


EXAMPLE 7 Using the Gram-Schmidt Process 

Assume that the vector space R-' has the Euclidean inner product. Apply the Gram-Schmidt process 
to transform the basis vectors 

111 = (1,1,1), 112= (0,1,1), 113 = (0,0,1) 

into an orthogonal basis {vi, V 2 , V 3 } , and then normalize the orthogonal basis vectors to obtain an 
orthonormal basis (qi, q 2 , q 3 ) . 

Solution 

Step 1 . vi =111 = ( 1 , 1 , 1 ) 

Step 2. (u 2 , vi} 

v 2 = u 2 - projfp, u 2 = u 2 - - 1 - 

llvill 2 

= ( 0 , 1 , 1 )= 1 ) 









Step 3. 


v 3 


u 3 - proj ^ 2 u 3 = u 3 


-n, ( u 3- v l) __ (u 3 ,V2) 


ii^iir M\ d 

= (o. o, i) - j(i, i, i) — jtj(- j. j) 


-V 2 


Thus, 


V, = 0,1.1), v 2 =(-|,il). v 3 =(o,-i,i) 

form an orthogonal basis for The norms of these vectors are 

l|vil| = /3, ||v 2 || = 


= #, l|V3ll = -j= 
3 /2 


so an orthonormal basis for p is 


qi = 


VI / 1 1 1 


ll^lll [fi- ft /3 


q2 


_ _V2__ f_2_1_1_ 


l|v 2 ll 


^6 ^6 ^6 J 


q3 


= _X3_ = [o __L _L 

H v 3ll [’ f2 f 2) 


In the last example we normalized at the end to convert the orthogonal basis into an orthonormal basis. 
Alternatively, we could have normalized each orthogonal basis vector as soon as it was obtained, thereby producing 
an orthonormal basis step by step. However, that procedure generally has the disadvantage in hand calculation of 
producing more square roots to manipulate. A more useful variation is to “scale” the orthogonal basis vectors at 
each step to eliminate some of the fractions. For example, after Step 2 above, we could have multiplied by 3 to 
produce ( — 2, 1, 1) as the second orthogonal basis vector, thereby simplifying the calculations in Step 3. 



Erhardt Schmidt (1875-1959) 

Schmidt wasa German mathematician who studied for his doctoral degree at Gottingen 
University under David Hilbert, one of the giants of modern mathematics. For most of his life he taught at 
Berlin University where, in addition to making important contributions to many branches of mathematics, 
he fashioned some of Hilbert's ideas into a general concept, called a Hilbert space —a fundamental idea in 











the study of infinite-dimensional vector spaces.He first described the process that bears his name in a paper 
on integral equations that he published in 1907. 

[Image: Archives of the Mathematisches Forschungsinst\ 



Jorgen Pederson Germ (1850-1916) 


Gram was a Danish actuary whose early education was at village schools 


supplementedby private tutoring. He obtained a doctorate degree in mathematics while working for the 
Hafnia Life Insurance Company, where he specialized in the mathematics of accident insurance.lt was in his 
dissertation that his contributions to the Gram-Schmidt process were formulated. He eventually became 
interested in abstract mathematics and received a gold medal from the Royal Danish Society of Sciences 
and Letters in recognition of his work. His lifelong interest in applied mathematics never wavered, however, 
and he produced a variety of treatises on Danish forest management. 

[Image: wikipedia] 


CALCULUS REQUIRED 


EXAMPLE 8 Legendre Polynomials 


Let the vector space P 2 have the inner product 



Apply the Gram-Schmidt process to transform the standard basis -j 1, j- for P 2 into an 
orthogonal basis {$1 (x ), 62 (*) ,<j> 3 (*)} • 


Take m = 1, U 2 = x, and 
Step 1. vi=U! = l 

We have 


(u 2 , vi}= J ^ xdx = 0 





so 


tep 3 We have 


so 


(«2. v l) 

= in- 1 - L vi = n o = 


v 2 — u 2 


VI =\1 2 =X 


x z dx = 


( U3 ’ v i}=/_i 

(„ 3 ,v 2 } = y_ i 

iiviii 2 =(vi, vi}=y ^ 


-I i 


j-i 

-I i 


x 3 = ^r- 

4 


= 0 


J-l 


-il 


1 dx = x 


= 2 


J-l 


fu 3 ,vi) (u 3 , v 2 } 2 1 

= - 1 -“”V \ — 1 -_ v o = x — — 


v 3 = u 3 


iiviir 


l|v 2 || 


2 V2 = * 3 


Thus, we have obtained the orthogonal basis (x ), d> 2 (*) > $3 00} in which 

^lO) = l, 6 2 (x)=x , 6 3 (x)=x 2 -^ 


The orthogonal basis vectors in the foregoing example are often scaled so all three functions have a value 
of 1 at x = 1 • The resulting polynomials 


'■ *• it 3 * 2 - 1 ) 

which are known as the first three Legendre polynomials , play an important role in a variety of applications. The 
scaling does not affect the orthogonality. 


Extending Orthonormal Sets to Orthonormal Bases 

Recall from part ( b ) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional vector space can be 
enlarged to a basis by adding appropriate vectors. The following theorem is an analog of that result for orthogonal 
and orthonormal sets in finite-dimensional inner product spaces. 


THEOREM 6.3.6 

If IT is a finite-dimensional inner product space, then: 

(a) Every orthogonal set of nonzero vectors in W can be enlarged to an orthogonal basis for W. 

(b) Every orthonormal set in W can be enlarged to an orthonormal basis for W. 








We will prove part ( b ) and leave part ( a ) as an exercise. 

Suppose that S = (vj, V2,v 5 } is an orthonormal set of vectors in W. Part ( b ) of Theorem 4.5.5 tells 
us that we can enlarge S to some basis 


s”= (v lt v 2 . v 5 , V J+ 1 . V*) 

for W. If we now apply the Gram-Schmidt process to the set then the vectors v\, V2,v 5 , will not be affected 
since they are already orthonormal, and the resulting set 

s"= (vi, V 2 . v 5 , V 5+ l . v k ) 

will be an orthonormal basis for W. 


OPTIONAL 

QR-Decomposition 

In recent years a numerical algorithm based on the Gram-Schmidt process, and known as QR-decomposition , has 
assumed growing importance as the mathematical foundation for a wide variety of numerical algorithms, including 
those for computing eigenvalues of large matrices. The technical aspects of such algorithms are discussed in 
textbooks that specialize in the numerical aspects of linear algebra. However, we will discuss some of the 
underlying ideas here. We begin by posing the following problem. 


Problem 

If A is an m x n matrix with linearly independent column vectors, and if Q is the matrix that results by 
applying the Gram-Schmidt process to the column vectors of A, what relationship, if any, exists between A 
and QI 


To solve this problem, suppose that the column vectors of A are , 112 ,..u„ and the orthonormal column vectors 
of Q are qj, q 2 ,q M . Thus, A and Q can be written in partitioned form as 

A = [ui|u 2 | — |u„] and Q = [qi|q 2 |--- |q n ] 

It follows from Theorem 6.3.2 b that ui, 112,u„ are expressible in terms of the vectors qj, q2,q„ as 


«i = ( u i> qi }qi 

+ 

(®1. 92}q2 

+ ‘ • 

‘ + 

(«i. q«}q« 

u 2 = ( u 2 > qi }qi 

+ 

(«2. q2}q2 

+ ’ ' 

• + 

( u 2> q«}q« 

5 

. 5 

£ 

II 

* 

+ 

(®m. «2}q2 

+ • ■ 

. ^ 

(«». q«}q« 


Recalling from Section 1.3 (Example 9) that theyth column vector of amatrix product is a linear combination of the 
column vectors of the first factor with coefficients coming from the jth column of the second factor, it follows that 
these relationships can be expressed in matrix form as 


(«i. qi} 

( u 2> qi} - 

- (u M . qi} 

(«i, q2} 

( u 2> q2} 

- (u M , q2} 

(®i. q«} 

( u 2> q«} - 

•• (««. q«} 


[ui|u 2 |... |u„] = [qi|q 2 |... |q„] 




or more briefly as 


A = QR (14) 

where R is the second factor in the product. However, it is a property of the Gram-Schmidt process that for j > 2, 
the vector < 1 ; is orthogonal to uq, 112, .. u ? _i. Thus, all entries below the main diagonal of R are zero, and R has the 
form 

(ui, qi} (U 2 . qi} 

R= 0 ( u 2> q 2 } 

0 0 

We leave it for you to show that R is invertible by showing that its diagonal entries are nonzero. Thus, Equation 14 
is a factorization of A into the product of a matrix Q with orthonormal column vectors and an invertible upper 
triangular matrix R. We call Equation 14 the QR-decomposition of A. In summary, we have the following theorem. 


(U„, qi} 
(u«, q2} 

(u„, q M } 


(15) 


Q/?-Decomposition 

If A is an m x n matrix with linearly independent column vectors, then A can be factored as 

A = QR 

where Q is an ^ x n matrix with orthonormal column vectors, and R is an ^ x n invertible upper triangular 
matrix. 


It is common in numerical linear algebra to say 
that a matrix with linearly independent columns 
has full column rank. 


Recall from Theorem 5.1.6 (the Equivalence Theorem) that a square matrix has linearly independent column 
vectors if and only if it is invertible. Thus, it follows from the foregoing theorem that every invertible matrix has a 
QR-decomposition . 


EXAMPLE 9 QR-Decomposition of a 3 x 3 Matrix 


Find the (^-decomposition of 



0 0 
1 0 
1 1 


The column vectors of A are 


V 


'o' 


'O' 

1 

, u 2 = 

1 

. u 3 = 

0 

1 


1 


1 


Applying the Gram-Schmidt process with normalization to these column vectors yields the 












orthonormal vectors (see Example 7) 


1 


2 



f 

1 


1 


0 

1 

-f2 

1 

f 

. 92 = 

/? 

. 93 = 

l 


1 

f . 




Thus, it follows from Formula 15 that R is 


R = 


(ui,qi) (u 2 , qi} 

0 ( u 2> 92} 

0 0 


(«3. qi} 

(™3. *12} 
(U3. 93} 


3 2 1 

ft ft 

0 -2. J_ 

0 0 - 7 = 

f 


Show that the matrix 0 in Example 9 has 
the property QQ = /, and show that every 

mxn matrix with orthonormal column 
vectors has this property. 

from which it follows that the ^-decomposition of A is 

3 2 1 

ft ft ft 
o -2_ _L 
fe fe 

0 0 - 7 = 

f 

R 


1 0 
1 1 


1 1 1 


f f 


1 


—7= o 


1 


f f 

1 1 

f f 

Q 


1 

1 


Concept Review 

Orthogonal and orthonormal sets 
Normalizing a vector 
Orthogonal projections 
Gram-Schmidt process 
(^-decomposition 


Skills 


















Determine whether a set of vectors is orthogonal (or orthonormal). 

Compute the coordinates of a vector with respect to an orthogonal (or orthonormal) basis. 

Find the orthogonal projection of a vector onto a subspace. 

Use the Gram-Schmidt process to construct an orthogonal (or orthonormal) basis for an inner product 
space. 

Find the ^-decomposition of an invertible matrix. 


Exercise Set 6.3 


1. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on p/'? 
(a) (0. 1), (2. 0) 


(b) 

(c) 


1 1 


1 1 


\ 


_j_i_\ lj _i_ 

l /2' /sj’l/2'/2 
(d) ( 0 , 0 ), ( 0 , 1 ) 


Answer: 

(a), (b), (d) 

2. Which of the sets in Exercise 1 are orthonormal with respect to the Euclidean inner product on 

3. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on R- ,c ! 


(b) (2 _2 1W2 1 _2Wi 2 2'l 

\y 3 ’ 3y \ 3 ’ 3’ 3)’\3’3’3) 


(c) 

0 , 0 . 0 ), 

(d) / 


o 7?’ 7? J’ 


J_1_2_ _ 

(/?'/?' /ej’l/s 


1 L.o' 


f2- 


Answer: 


(b), (d) 


4. Which of the sets in Exercise 3 are orthonormal with respect to the Euclidean inner product on £ 3 ? 

5. Which of the following sets of polynomials are orthonormal with respect to the inner product on P 2 discussed in 
Example 7 of Section 6.1 ? 

(a) pi (j) = | - jx + jx 2 , P 2 (x) = | + yx - jx 2 , p 3 (x) = j 4- jx + jx 2 










(b) Pl (x) = 1, p 2 (x) = +* + 4=x 2 . p 3 (x) =X 2 

y 2 y 2 


Answer: 


(a) 


6. Which of the following sets of matrices are orthonormal with respect to the inner product on M 22 discussed in 
Example 6 of Section 6.1 ? 


1 0 
0 0 


(b) 


‘ I 



0 !l 


0 - 


3 


3 


2 1 

* 

2 2 


3 3 


3 3 


ri 01 


0 r 


1 

0 

0 


O 

O 

0 

0 

1_ 

7 

0 

0 

_1 

7 

L> 1 

7 

L> -'J 


7. Verify that the given vectors form an orthogonal set with respect to the Euclidean inner product; then convert it 
to an orthonormal set by normalizing the vectors. 

(a) (-1,2), (6, 3) 

(b) (1, 0, - 1), (2, 0, 2), (0, 5, 0) 

(C) ( 5 ' 5' 5 ) ("2* 2’°)’ (f 3' "I) 

Answer: 

(a) (_ J_ _2_\ f_2_ J_\ 

r & fir (fi' fit 


(b) 


(c) 


4 =, 0 . - 4 = 1 . 


fi' ’ ffixfi fi 


+. 0 . 4-1 ( 0 . 1 . 0 ) 


J_1_1_ 

fi’ fi’ fi 


o' 


1 1 


fi f2 7 [fi- fe ft>) 


8 . Verify that the set of vectors { (1, 0), (0, 1) } is orthogonal with respect to the inner product 
(u, v J = Au iv 1 I & 2 v 2 on then convert it to an orthonormal set by normalizing the vectors. 

9. Verify that the vectors 

VI = ( - f. f. 0 j, v 2 = §, 0 j, v 3 = (0, 0, 1) 

form an orthonormal basis for p-' with the Euclidean inner product; then use Theorem 63.2b to express each of 
the following as linear combinations of vj, V 2 , and V 3 . 

(a) ( 1 , - 1 , 2 ) 

(b) (3, -7,4) 

(c) fi _ 1 5) 

\7’ 7’ 7J 


Answer: 



( a ) — ^-vi + ^V2 + 2V3 

(b) ^. V j _ ^y 2 4 - 4v 3 

(c) _^ Vl _ i V2 + ^ V3 


10. Verify that the vectors 


vi = ( 1 , - 1 . 2 . - 1 ), 

v 3 = ( 1 . 2 , 0 , - 1 ), 


v 2 = ( — 2. 2. 3, 2 ), 
v 4 = (1, 0, 0,1) 


form an orthogonal basis for with the Euclidean inner product; then use Theorem 63.2a to express each of 
the following as linear combinations of vj, V 2 , V 3 , and V 4 . 

(a) ( 1 , 1 , 1 , 1 ) 

(b) [i[2, -3/2, 5/2, -{2) 


(c) 


/_! 2 _1 4\ 
l 3’ 3’ 3’ 3J 


(a) Show that the vectors 

vj = (1, -2,3, -4), v 2 = (2, 1,-4, -3), 

v 3 = (-3,4, 1,-2), v 4 = (4,3,2, 1) 

form an orthogonal basis for with the Euclidean inner product. 

(b) Use Theorem 63.2a to express u = ( — 1, 2, 3, 7) as a linear combination of the vectors in part (a). 


Answer: 


(b) u = - - ]iv 2 + 0v 3 4- ^v 4 

In Exercises 12-13, an orthonormal basis with respect to the Euclidean inner product is given. Use Theorem 63.2b 
to find the coordinate vector of w with respect to that basis. 


12 . 


<“)„=(3,7 );„ 1 = (J=,--Lj,u 2 = 


J_ 1 

1 / 2 ' /2 


(b) «r=(-1,0,2);«i = (|. j, |,|j 

13 '(a) w = ( 2 , 0 . 5).», = (§, I « 2 = (1 f, -§),» 3 = (§,-§.-l) 


(b) w=(-1,l,2),u 1 = ' 3 1 1 


/if' fn' /ITj’” 2- 1 /?' /?' /?)• 


u 3 = 


1 


1 


\ 


{ 66 ’ { 66 ’ {66 


Answer: 


W= -y-ui - |u 2 - JU 3 


(a) 




In Exercises 14-15, the given vectors are orthogonal with respect to the Euclidean inner product. Find proj^rx, 
where x = (1, 2, 0, — 2) and W is the subspace of spanned by the vectors. 


14 -(a) vi = (1.1, l,l),v 2 = (l,l, -1,-1) 

(b) vj = (0, 1. -A, — 1), V 2 = (3, 5, 1.1) 

15. ( a ) V j = (] ; l ; 1), V2 = (1, 1, - 1, - 1), V3 = (1, -1,1, - 1) 

(b) vj = (0, 1, -A, - 1),V2= (3, 5, 1, 1),V3 = (1, 0, 1, -4) 


Answer: 

(a) (1 1 _1 _5 
\A’A’ A’ A 

(b) (xl i _±_ 

\\2’A’ 12 ’ 12 J 

In Exercises 16-17, the given vectors are orthonormal with respect to the Euclidean inner product. Use Theorem 
6.3 Ab to find proj^r x, where x=(l,2,0, —1) and W is the subspace of spanned by the vectors. 






(b) vj 


(1111 
U’ 2’ 2’ 2 



1 I 
2 ’ 2 ’ 




vi = 0 


/l8' /l8’ /l8 


= (1111 
U’ 6’ 6’ 6 


/18 fit’ 


'1111 


i2’ 2’ 2’ 2 ’ 


i2’2' 2’ 2 r 


i2’ 2’ 2’ 



Answer: 


(a) (23 EL __L 

U8’ 6 ’ 18’ 18 J 

(b) 2 i _i _ n 
\ 2 ’ 2 ’ 2 ’ 2 ) 

18. In Example 6 of Section 4.9 we found the orthogonal projection of the vector x = (1, 5) onto the line through 
the origin making an angle of 77 / 6 radians with the x-axis. Solve that same problem using Theorem 6.3.4. 

19. Find the vectors w\ in W and W 2 in W 1 such that x = w\ + W 2 , where x and W are as given in 

(a) Exercise 14(a). 

(b) Exercise 15(a). 


Answer: 




<a) wi = (§,§, -1, -lj, w 2 =(-I i.l, -l) 

(b) 


(1 

5 

3 

5^ / 

3 

3 

3 

3^ 

-\4’ 

4’ 

4’ 


4’ 

4’ 

4’ 

4j 


20. Find the vectors in W and w 3 in W 1 such that x = wi + w 3 , where x and W are as given in 

(a) Exercise 16(a). 

(b) Exercise 17(a). 

21. Let R 1 have the Euclidean inner product. Use the Gram-Schmidt process to transform the basis {ui, 112 } into 
an orthonormal basis. Draw both sets of basis vectors in the xj-plane. 

(a) *1 = (1, -3), u 2 = (2, 2) 

(b) ui = (1, 0), u 2 = (3, -5) 


Answer: 


(a) 


vi = 


/To’ /To 


. v 2 = 


/To’ /To / 


(b) Vi = ( 1 , 0 ), v 2 = ( 0 , - 1 ) 


T 

22. Let R[ have the Euclidean inner product. Use theGram-Schmidt process to transform the basis (uj, 112,113} 
into an orthonormal basis. 

(a) ui = (1, 1 , 1), u 2 = ( - 1, 1, 0), u 3 = (1, 2, 1) 

(b) U1 = (1, 0, 0), u 2 = (3,7, — 2), u 3 = (0, 4, 1) 

23. Let R^ have the Euclidean inner product. Use the Gram-Schmidt process to transform the basis 
(ui, U 2 , u 3 , 114 } into an orthonormal basis. 












ui = (0,2, 1,0), u 2 = (1, -1,0,0), 
u 3 = (1, 2, 0, — 1), U4=(1.0,0,l) 


Answer: 

( 


VI = 


v 3 = 


°' k /5’°) V2_ 


yio’ / 3 o’ /30 


1 


1 


\ 


/To’ /To' /To' /To 


/ 


v 4 = 


1 


.0. 


1 


/TT /!?’ /!?’ /15 J 

24. Let R-' have the Euclidean inner product. Find an orthonormal basis for the subspace spanned by (0, 1,2), 
(“1,0, 1), (“1,1, 3). 

25. Let R-' have the inner product 

(u, vj =u\v\ + 2 ^ 2 v 2 + 3 & 3 V 3 

Use the Gram-Schmidt process to transform u\ = (1, 1, 1), 113 = (1, 1, 0), 113 = (1, 0, 0) into an orthonormal 
basis. 


Answer: 


vi = 


1 1 


1 


11 


fe fs fs fi) 


1 1 


1 


\ 


’ V 3 -{k~k° 


26- Let R 3 have the Euclidean inner product. The subspace of r} spanned by the vectors ui = 0 , — J and 

U 2 = (0, 1, 0) is aplane passing through the origin. Express w= (1, 2, 3) in the form w = wj 4 W 2 , where wj 
lies in the plane and W 2 is perpendicular to the plane. 

27. Repeat Exercise 26 with uj = (1, 1, 1) and U 2 = (2, 0, — 1). 

Answer: 

Wl = fil li 40\ M _J_ AT 

1 ^14’ 14’ 14/ 2 04’ 14 ’ 14 J 

28. Let R^ have the Euclidean inner product. Express the vector w = ( — 1, 2, 6, 0) in the form w = 4- W 2 , 

where wj is in the space W spanned by m = ( — 1, 0, 1, 2) and U 2 = (0, 1,0, 1), and W 2 is orthogonal to W. 

29. Find the {^-decomposition of the matrix, where possible. 


(a) 

1 - 

1 


_2 

3_ 

(b) 

"1 2' 



0 1 



1 4 


(c) 

1 

r 


-2 

1 


2 

1 

(d) 

"1 0 

2 


0 1 

1 


1 2 

0 



(e) 

1 

2 

1 


1 

1 

1 


0 

3 

1 

(f) 


1 

0 1 


- 

•1 

1 1 



1 

0 1 


— 

•1 

1 1 


Answer: 


(a) 


J__2_ 

f 'f 


_2_ J_ 

f f5 


f f 

0 {5 


(b) 



0 




f 


f 


f 3/2 

0 (3 


(c) 


1 8 

3 i/234 

_2 11 

3 /234 

1 7 

3 i/234 


1 

3 

l/26 

3 


(d) 


1 _1_ 1 

\[2 {3 {l 

2 

fe 


0 —L 

/2 


1 


/? 


|/2 |/2 {2 

0 

0 0 

f 


(e) 





2/T9 

2 /T 9 

31/2 
1/I9 



0 

0 




(f) Columns not linearly independent 



30. In Step 3 of the proof of Theorem 6.3.5, it was stated that “the linear independence of {uj, U 2 ,u M ) ensures 
that V 3 * 0.” Prove this statement. 

31. Prove that the diagonal entries of R in Formula 15 are nonzero. 

32. Calculus required Use Theorem 6.3.2 a to express the following polynomials as linear combinations of the first 
three Legendre polynomials (see the Remark following Example 8 ). 

(a) 1 -|- x + Ax 2 

(b) 2-lx 2 

(c) 4 + 3x 


33. Calculus required Let Pj have the inner product 


|p, q 


-/ 


P(x)q(x) dx 


Apply the Gram-Schmidt process to transform the standard basis S= <j 1, x, x~ l into an orthonormal basis. 


Answer: 

V 1 = 1, V2 = / 3 ( 2 x - 1), V3 = /5(6x 2 - 6 x + 1) 

34. Find vectors x and y in R 1 that are orthonormal with respect to the inner product (u, v } = 3u \ v \ } 2 ^ 2 V 2 but 
are not orthonormal with respect to the Euclidean inner product. 

True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) Every linearly independent set of vectors in an inner product space is orthogonal. 

Answer: 

False 

(b) Every orthogonal set of vectors in an inner product space is linearly independent. 

Answer: 

False 

(c) Every nontrivial subspace of R has an orthonormal basis with respect to the Euclidean inner product. 
Answer: 

True 

(d) Every nonzero finite-dimensional inner product space has an orthonormal basis. 

Answer: 

True 

(e) projw x is orthogonal to every vector of W. 


Answer: 



False 

(f) If A is an n x n matrix with a nonzero determinant, then A has a ^-decomposition. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



6.4 Best Approximation; Least Squares 

In this section we will be concerned with linear systems that cannot be solved exactly and for which an approximate solution is 
needed. Such systems commonly occur in applications where measurement errors “perturb” the coefficients of a consistent system 
sufficiently to produce inconsistency. 


Least Squares Solutions of Linear Systems 

Suppose that = b is an inconsistent linear system of m equations in n unknowns in which we suspect the inconsistency to be 
caused by measurement errors in the coefficients of A. Since no exact solution is possible, we will look for a vector x that comes as 
“close as possible” to being a solution in the sense that it minimizes ||b — Ax\\ with respect to the Euclidean inner product on R m . 
You can think of Ax as an a Pproximation to b and ||b — ^x|| as the error in that approximation—the smaller the error, the better 
the approximation. This leads to the following problem. 


Least Squares Problem 

Given a linear system Ax = b of m equations in n unknowns, find a vector x that minimizes ||b — Ax\\ with respect to the 
Euclidean inner product on R™. We call such an x a least squares solution of the system, we call b — Ax the least squares 
error vector , and we call ||b — Ax\\ the least squares error. 

J 


To clarify the above terminology, suppose that the matrix form of b — Ax is 

"*1 

b — Ax = 


e 2 


2 2 2 

The term “least squares solution” results from the fact that minimizing ||b — -<4x|| also minimizes ||b — Ax || = e l + e 2 


Best Approximation 


Suppose that b is a fixed vector in ^ J that we would like to approximate by a vector w that is required to lie in some subspace W 
of p}. Unless b happens to be in IV, then any such approximation will result in an “error vector” b — w that cannot be made equal 
to 0 no matter how w is chosen (Figure 6.4.1a). However, by choosing 

w=projjp b 


we can make the length of the error vector 


||b-w|| = ||b-proj^ b|| 


as small as possible (Figure 6.4. lb). 


b -w 


b-proj lv b 


r 


w 


(tf) 


proj B ,b q 

(*) 


W 


Figure 6.4.1 




These geometric ideas suggest the following general theorem. 


Best Approximation Theorem 

If W is a finite-dimensional subspace of an inner product space V, and if b is a vector in V, then proj^r b is the best 
approximation to b from W in the sense that 

Ilk — projjp b|| < ||b — w|| 

for every vector w in W that is different from proj^r b. 


For every vector w in W, we can write 

b — w= (b -proj^ b) 4- (proj^ b -w) (1) 

But projjp b — w being a difference of vectors in W is itself in W; and since b — proj^ b is orthogonal to W, the two terms on the 
right side of 1 are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that 

lib— w|| 2 = ||b— projfp b|| 2 + ||proj^ b-w|| 2 
Since w* proj^ b, it follows that the second term in this sum is positive, and hence that 

l|b — projfp b|| 2 < ||b —w|| 2 

Since norms are nonnegative, it follows (from a property of inequalities) that 

||b — projjp b|| < ||b — w|| 


Least Squares Solutions of Linear Systems 

One way to find a least squares solution of = b is to calculate the orthogonal projection proj^r b on the column space W of the 
matrix A and then solve the equation 


Ax. = pr°j w b (2) 

However, we can avoid the need to calculate the projection by rewriting 2 as 

b — Ax. = b — projjTfr b 

and then multiplying both sides of this equation by A ^ to obtain 

^ r (b-^x)=^ r (b-pro Jfr b) (3) 

Since b — proj^ b is the component of b that is orthogonal to the column space of A , it follows from Theorem 4.8.9 b that this 
vector lies in the null space of A and hence that 

A T ( b— projfjr b) =0 

Thus, 3 simplifies to 

^ r (b-^x) = 0 

which we can rewrite as 


A t Ax = A t b 


(4) 


This is called the normal equation or the normal system associated with = b- When viewed as a linear system, the individual 
equations are called the normal equations associated with Ax = b- 

In summary, we have established the following result. 


THEOREM 6.4.2 

For every linear system Ax = b> the associated normal system 

A T Ax = A T b (5) 

is consistent, and all solutions of 5 are least squaressolutions of Ax = b- Moreover, if W is the column space of A, and x is 
any least squares solution of ^ = b> then the orthogonal projection of b on W is 

projfp b = Ax (6) 


If a linear system is consistent, then its exact solutions are 
the same as its least squares solutions, in which case the 
error is zero. 

EXAMPLE 1 Least Squares Solution 


Find all least squares solutions of the linear system 


*1 

- 

x 2 

= 4 

3*1 

+ 

2X2 

= 1 

—2xi 

+ 

4X2 

= 3 


Find the error vector and the error. 


Solution 


It will be convenient to express the system in the matrix form Ax = b> where 


It follows that 



1 

-1 


A 

A = 

3 

2 

and b = 

1 


-2 

4 


3 





1 

-l' 



1 

3 -2] 



r 14 —3 


-1 : 

2 4 

D 

z 

-3 21 



J 

-2 

4 

L J 





"4" 


A 

7b = 

1 3 

—2 

1 

= [ Jl 



-1 2 

4_ 


10 





3 

L J 


so the normal system A^Ax = A ^b 


" 14 

-3 

■*l" 


r 

-3 

21_ 

/ 2 _ 


_ io _ 


Solving this system yields a unique least squares solution, namely, 

17 _ 143 





















The error vector is 


and the error is 







92 " 


1232 " 


'4" 


1 

«r 

17 


4" 


285 


285 

Ax = 

1 


3 

2 

95 

_ 

1 


439 

_ 

154 






143 




285 


285 


3 


=2 

4 

285 


3 


95 


4 






57 


3 


||b-;4x||»4.556 


EXAMPLE 2 Orthogonal Projection on a Subspace 

Find the orthogonal projection of the vector u = ( — 3, — 3, 8, 9) on the subspace of spanned by the vectors 
u l = (3, 1,0, 1), u 2 = (1, 2, 1, 1), u 3 = (-1,0, 2, -1) 


We could solve this problem by first using the Gram-Schmidt process to convert {uj, u 2 , 113 } into an 
orthonormal basis and then applying the method used in Example 6 of Section 6.3 . However, the following method 
is more efficient. 


The subspace W of spanned by uj, u 2 , and 113 is the column space of the matrix 


A = 


1 -1 

2 0 

1 2 

1 -1 


Thus, if u is expressed as a column vector, we can find the orthogonal projection of u on W by finding a least 
squares solution of the system Hx = u and then calculating projpp u = Ax from the least squares solution. The 
computations are as follows: The system Ax = u is 

-1 

0 
2 


-1 


*1 

*2 

*3 


-3 

-3 

8 

9 


so 


A 7 A 


A 7 u 


3 10 1 

12 1 1 

-1 02-1 


3 1 -1 
1 2 0 
0 1 2 
1 1 -1 


11 6 -4 

6 7 0 

-4 0 6 







— 3 " 




3 

1 

0 

f 

—3 


’—3" 

= 

1 

2 

1 

1 

8 

9 

= 

8 


-1 

0 

2 

-1 


10 









The normal system ^Ax = A in this case is 


'll 6 

-4' 

"*r 


- 3 ' 

6 7 

0 

*2 

= 

8 

1- 

1 

O 

6 

*3 


10 


Solving this system yields 














































"*l“ 


'-r 

X = 

x 2 

= 

2 


*3 


i 


as the least squares solution of Ax = u (verify), so 


projfp u = Ax = 


'3 

1 

-l' 

'-r 


"-2" 

1 

2 

0 


3 

0 

1 

2 

6 

1 


4 

1 

1 

-1 


0 


or, in comma-delimited notation, projpp u = ( — 2, 3, 4, 0). 


Uniqueness of Least Squares Solutions 

In general, least squares solutions of linear systems are not unique. Although the linear system in Example 1 turned out to have a 
unique least squares solution, that occurred only because the coefficient matrix of the system happened to satisfy certain conditions 
that guarantee uniqueness. Our next theorem will show what those conditions are. 


THEOREM 6.4.3 

If A is an m x n matrix, then the following are equivalent. 

(a) A has linearly independent column vectors. 

(b) A ^A is invertible. 

We will prove that (<s) ( b ) and leave the proof that (6) => ((3) as an exercise. 

(a) =* (b) Assume that A has linearly independent column vectors. The matrix A 1 A has size ^ x so we can prove that this 
matrix is invertible by showing that the linear system A ^Ax = 0 has only the trivial solution. But if x is any solution of this 
system, then Ax is in the null space of A^ and also in the column space of^4. By Theorem 4.8.9 b these spaces are orthogonal 
complements, so part (Z?) of Theorem 6.2.4 implies that = 0- But A is assumed to have linearly independent column vectors, so 
x = 0 hy Theorem 1.3.1. 

As an exercise, try using Formula 7 to solve the problem 
in part (a) of Example 1. 


The next theorem, which follows directly from Theorem 6.4.2 and Theorem 6.4.3, gives an explicit formula for the least squares 
solution of a linear system in which the coefficient matrix has linearly independent column vectors. 


THEOREM 6.4.4 

If A is an ^ x n matrix with linearly independent column vectors, then for every mx \ matrix b, the linearsystem Ax = b 
has a unique least squares solution. This solution is given by 

,= (^)-Vb 


(7) 












Moreover, if W is the column space of A, then the orthogonalprojection of b on W is 


projfp b = 


Ax 


=a(a t a) ] \ 


A 1 b 


(8) 


OPTIONAL 

The Role of QR-Decomposition in Least Squares Problems 

Formulas 7 and 8 have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of 
Ax = b are typically found by using some variation of Gaussian elimination to solve the normal equations or by using 
^-decomposition and the following theorem. 


THEOREM 6.4.5 

If A is an ^ x n matrix with linearly independent column vectors, and if A = QR is a ^-decomposition of A (see Theorem 
6.3.7), then for each b in R 171 the system Ax = b has a unique least squares solution given by 

x = J R- 1 e r b (9) 


A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However, 

T 

you can obtain Formula 9 by making the substitution A = QR in 7 and using the fact that Q Q = / to obtain 

* = (ce*) r (e*)) -1 (e*) r b 
= [R T Q T QRy\QR) T h 
= R~ l lR T y 1 R T Q T b 
= * _1 e r b 

Orthogonal Projections on Subspaces of R m 

In Section 4.8 we showed how to compute orthogonal projections on the coordinate axes of a rectangular coordinate system in R 3 
and more generally on lines through the origin of R-'. We will now consider the problem of finding orthogonal projections on 
subspaces of R m . We begin with the following definition. 


DEFINITION 1 

If IF is a subspace of R™, then the linear transformation P:R m —► W that maps each vector x in R™ into its orthogonal 
projection proj^r x in IF is called the orthogonal projection of R m on W 

J 


It follows from Formula 7 that the standard matrix for the transformation P is 


(10) 


IP]=a(a^a)-'a t 

where A is constructed using any basis for W as its column vectors. 

EXAMPLE 3 The Standard Matrix for an Orthogonal Projection on a Line 

We showed in Formula 16 of Section 4.9 that 


Po = 


2 

cos 9 sin 9 cos 9 
sin 9 cos 9 sin 2 9 

is the standard matrix for the orthogonal projection on the line W through the origin of R^ that makes an angle 0 with 
the positive x-axis. Derive this result using Formula 10. 

The column vectors of A can be formed from any basis for W. Since W is one-dimensional, we can take 
w= (cos 9, sin 9) as the basis vector (Figure 6.4.2), so 

cos 9 


A = 


sin 9 


We leave it for you to show that A ' A is the 1 x 1 identity matrix. Thus, Formula 10 simplifies to 

[cos# sin 0] 


f a t a) 1 a t =aa t = 

COS 9 

V / 

sin 9 


cos^0 sin 9 cos 9 
sin 9 cos 9 sin 2 9 


= Pb 



Another View of Least Squares 

Recall from Theorem 4.8.9 that the null space and row space of an m x n matrix^ are orthogonal complements, as are the null 
space of^ and the column space of A. Thus, given a linear system Ax = b i n which A is an m x n matrix, the Projection 
Theorem (6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal terms as 

x = x r0 w( 4 ) + x null(4) b = b nu ll^ r ) + bcol (4i 

where x row(4i and x null(4i are the orthogonal projections of x on the row space of A and the null space of A , and the vectors 
and b co i (j§ are the orthogonal projections of b on the null space of A J and the column space of A. 

In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in R n and R™ on which we indicated the 
orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.) 
The figure shows Ax as a point in the column space of A and conveys that b co ^ is the point in col(^4) that is closest to b. This 












illustrates that the least squares solutions of ^ = b are the exact solutions of the equation Ax = b co ^. 

nuIlM) col(.4) 


\lullM I 


R n 


-row(/4) null(A0- 


Ax 


R ■ 


Figure 6.4.3 


More on the Equivalence Theorem 

As our final result in the main part of this section we will add one additional part to Theorem 5.1.6. 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax. = b is consistent for every n x 1 matrix b. 

(f) Ax = b has exactly one solution for every n x 1 matrix b. 

(g) det(^) * 0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span R n . 

(k) The row vectors of A span R n . 

(l) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(n) ^has r anktf- 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of A is R n . 

(q) The orthogonal complement of the row space of A is {0} . 

(r) The range of T is R n . 

(s) Tj\ is one-to-one. 

(t) \ = 0 is not an eigenvalue of A. 

(u) A ^A is invertible. 


The proof of part ( u ) follows from part ( h ) of this theorem and Theorem 6.4.3 applied to square matrices. 









OPTIONAL 


We now have all the ingredients needed to prove Theorem 6.3.3 in the special case where V is the vector space R m . 

We will leave the case where W = {0} as an exercise, so assume that W * {0} . Let 
( V L v 2> Vfc) be any basis for W, and form the ^ x k matrix M that has these basis vectors as successive columns. This makes 
W the column space of M and hence W 1 the null space of M T We will complete the proof by showing that every vector u in R m 
can be written in exactly one way as 


u = wj 4 W 2 

where w\ is in the column space of M and = O’ However, to say that is in the column space of Mis equivalent to saying 

w\ = Mx for some vector x in R m , and to say that ][{ = 0 is equivalent to saying that M ^(u — wi) =0- Thus, if we can 

show that the equation 


il/ r (u-ilfx)=0 (11) 

has a unique solution for x, then w\ = Mx and W 2 = x — wj will be uniquely determined vectors with the required properties. To 
do this, let us rewrite 11 as 

M T Mx = M T u 

Since the matrix M has linearly independent column vectors, the matrix ][{ is invertible by Theorem 6.4.6 and hence the 
equation has a unique solution as required to complete the proof. 


Concept Review 

Least squares problem 
Least squares solution 
Least squares error vector 
Least squares error 
Best approximation 
Normal equation 
Orthogonal projection 

Skills 

Find the least squares solution of a linear system. 

Find the error and error vector associated with a least squares solution to a linear system. 
Use the techniques developed in this section to compute orthogonal projections. 

Find the standard matrix of an orthogonal projection. 


Exercise Set 6.4 

1. Find the normal system associated with the given linear system. 


(a) 

1 

-f 

r*n 

2 


2 

3 

* - 

-1 


4 

5 

L J 

5 







(b) 



-1 


0 
1 2 
4 5 
2 4 


*1 

*2 

x 3 


-1 

0 

1 

2 


Answer: 


(a) 

(b) 


21 25 
25 35 
15 -1 5 

-1 22 30 

5 30 45 


T*r 


"20' 

|/2_ 


.2°. 


*1 

*2 

x 3 


-1 

9 

13 


In Exercises 2-4, find the least squares solution of the linear equation Ax = b- 


3., 


(a) 

1 -f 


2 ' 

A = 

2 3 

; b = 

-1 


4 5 


5 

(b) 

'2 -2' 


2 ' 

A = 

1 1 

;b = 

-1 


3 1 


1 

(a) 

1 f 


T 

A = 

-1 1 

, b = 

0 


-1 2 


-7 


(b) 


A = 


1 

0 

-l" 


'6' 

2 

1 

-2 

k — 

0 

1 

1 

0 

, D — 

9 

1 

1 

-1 


3 


Answer: 


(a) X1 =5, *2 = -^ 

(b) ^ l = 12, x 2 = -3, x 3 = 9 


4. 


(a) 


(b) 


A = 


A = 


'3 

2 

-1' 


2 

1 

-4 

3 

, b = 

-2 

1 

10 

-7 


1 

"2 

0 

-r 


o' 

1 

-2 

2 

b = 

6 

2 

-1 

0 

0 

0 

1 

-1 


6 


In Exercises 5-6, find the least squares error vector e = b — Ax resulting from the least squares solution x and verify that it is 
orthogonal to the column space of A. 

(a) A and b are as in Exercise 3(a). 

(b) A and b are as in Exercise 3(b). 


Answer: 



(a) 


(b) 


e = 


e = 


3 

2 

9 

2 

-3 

3 

-3 

0 

3 


b- (a) ,4 and b are as in Exercise 4(a). 

(b) A and b are as in Exercise 4(b). 

7. Find all least squares solutions of Ax = b andconfirm that all of the solutions have the same error vector. Compute the least 


squares error. 


(a) 

2 

r 


■ 3 ' 

A = 

4 

2 

; b = 

2 


-2 

1 


1 


(b) 

1 

3' 


T 

A = 

-2 

-6 

; b = 

0 


3 

9 


1 


(c) 

"-1 3 2' 


7" 

A = 

2 1 3 

;b = 

0 


0 1 1 


-7 


Answer: 

Solution: x = J; least squares error: 

(b) Solution: x = (y, 0 j +1(—3, 1) (/ a real number); least squares error: ^\[a2 

( c ) Solution: x = y, 4, 0 j + f (— 1, — 1, 1) (t a real number); least squares error: -~\j 294 

8 . Find the orthogonal projection of u on the subspace of p/ spanned by the vectors vj and V 2 . 

(a) u=(2,l,3); v, = (1,1,0), v 2 = (1,2,1) 

(b) u=(l, -6,1); v 1 = (- 1,2,1), v 2 = (2, 2,4) 

9. Find the orthogonal projection of u on the subspace of £ 4 spanned by the vectors vj, v 2 , and V 3 . 

(a) u=(6,3,9, 6);vi = (2, 1, 1, l),v 2 = (l,0, 1, l),v 3 = (-2, -1,0, -1) 

(b) u=(-2, 0,2,4); v 1 = (l, 1, 3, 0), v 2 = (- 2, -1,-2, l),v 3 = (-3, -1, 1,3) 


Answer: 


(a) (7, 2, 9, 5) 

(b) (_12 _ 4 12 

l 5' 5' 5’ 



10. Find the orthogonal projection of 11 = (5, 6 , 7, 2) on the solution space of the homogeneous linear system 

xi + x 2 + x 3 =0 
2x 2 +x 3 +*4= 0 

In each part, find A J, and apply Theorem 6.4.3 to determine whether .4 has linearly independent column vectors. 



(a) 

A = 

(b) 

A = 


-13 2 
2 1 3 
0 1 1 

2-1 3 

0 1 1 
-1 0 -2 
4-5 3 


Answer: 

(a) det (A A) = 0; A does not have linearly independent column vectors. 

(b) det (A A) = 0; A does not have linearly independent column vectors. 

12. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection _» p} onto 

(a) the x-axis. 

(b) they-axis. 

[Note: Compare your results to Table 3 of Section 4.9.] 

13. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection p onto 

(a) the xz-plane. 

(b) theyz-plane. 

[Note: Compare your results to Table 4 of Section 4.9.] 

Answer: 


(a) 


1 

0 

0 


[P] = 

0 

0 

0 



0 

0 

1 

(b) 


0 

0 

0 


[P] = 

0 

1 

0 



0 

0 

1 


14. Show that if w= ( a , b , c ) is a nonzero vector, then the standard matrix for the orthogonal projection of p? on the line 
span{w} is 

a 2 ah ac 
ab b 2 be 
ac be c 2 

15. Let W be the plane with equation + z = 0- 

(a) Find a basis for W. 

(b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. 

(c) Use the matrix obtained in part (b) to find the orthogonal projection of a point Pq(* 0 > yo, zq) on W. 

(d) Find the distance between the point — 2, 4) and the plane W, and check your result using Theorem 3.3.4. 


a 2 +b 2 +c 2 


Answer: 

(a) (1.0, -5), (0,1,3) 

(b) i r 1015 _5 

[P] = 35 15 26 3 

-5 3 34 



(c) / 2x 0 I 3 yg—ZQ 15x 0 I 26y 0 I 3zg -Sxp | 3yg I 34 zg \ 

[ 7 35 35 J 

(d) 3/35 

7 

16. Let IF be the line with parametric equations 

x = 2t, y = — z = 4^ 

(a) Find a basis for IF. 

(b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. 

(c) Use the matrix obtained in part (b) to find the orthogonalprojection of a point Pg(*0> jg, zg) on W. 

(d) Find the distance between the point Pg(2, 1,-3) and the line W. 

17. In R^, consider the line / given by the equations 

x =£, y=t, z = t 

and the line m given by the equations 

x = s, y = 2s — 1, z=l 

Let P be a point on /, and let Q be a point on m. Find the values of t and 5 that minimize the distance between the lines by 
minimizing the squared distance ||^ — Q\\ . 

Answer: 
s = t = 1 

18. Prove: If ^4 has linearly independent column vectors, and if Ax = b is consistent, then the least squares solution of = b an d 
the exact solution of Jix = b are the same. 

19. Prove: If ^4 has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares 
solution of — b is x = 0- 

20. Let P:R m —» W be the orthogonal projection of R m onto a subspace W. 

(a) Prove that [P] 2 = [P]. 

(b) What does the result in part (a) imply about the composition p 0 pi 

(c) Show that [P] is symmetric. 

21. Let A be an m x n matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of/!” 
onto the row space of A. [Hint: Start with Formula 10.] 

Answer: 

[P] = A 7 (AA 7 )-^ A 

22. Prove the implication (b) => (a) of Theorem 6.4.3. 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) If A is an m x n matrix, then A^A is a square matrix. 

Answer: 

True 

(b) If^ is invertible, then A is invertible. 

Answer: 


False 






(c) If A is invertible, then A ^A is invertible. 

Answer: 

True 

(d) If,4x = b is a consistent linear system, then A^Ax = A^h is also consistent. 

Answer: 

True 

(e) lfAx = h is an inconsistent linear system, then A ^Ax = A is also inconsistent. 

Answer: 

False 

(f) Every linear system has a least squares solution. 

Answer: 

True 

(g) Every linear system has a unique least squares solution. 

Answer: 

False 

(h) If A is an m x n matrix with linearly independent columns and b is in R m , then Ax = b has a unique least squares solution. 
Answer: 

True 
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6.5 Least Squares Fitting to Data 

In this section we will use results about orthogonal projections in inner product spaces to obtain a technique 
for fitting a line or other polynomial curve to a set of experimentally determined points in the plane. 


Fitting a Curve to Data 

A common problem in experimental work is to obtain a mathematical relationship y = / (x) between two 
variables x and v by “fitting” a curve to points in the plane corresponding to various experimentally 
determined values of x andy, say 

(*i>.yi), (^2.^2).—. (.Xn.yn) 


On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter 
decides on the general form of the curve y = f ( x ) 1° be fitted. Some possibilities are (Figure 6.5.1) 

(a) A straight line: y = a + bx 

'y 

A quadratic polynomial: y = a + bx + cx 

9 9 

(c) A cubic polynomial: y = a + bx + cx + dx 


Because the points are obtained experimentally, there is often some measurement “error” in the data, making 
it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose 
the curve (by determining its coefficients) that “best” fits the data. We begin with the simplest and most 
common case: fitting a straight line to data points. 


A> 




x 


(a) y = a + bx 


x 

-► 


(b ) y = a + bx 4- cjt 


Figure 6.5.1 


(c) y = a + bx + cxr -I- dx* 


Least Squares Fit of a Straight Line 

Suppose we want to fit a straight line y = a I bx to the experimentally determined points 

(*l>.yi)> (*2.72),—> (Xn,y n ) 

If the data points were collinear, the line would pass through all n points, and the unknown coefficients a and 
b would satisfy the equations 








y 1 

= a 

+ 

bx 1 

72 

= a 

+ 

bx 2 

y» 

= a 

+ 

bx n 


We can write this system in matrix form as 


1 x\ 



>r 

1 x 2 


'a 


72 

: : 


b 


: 

1 x n 



7m 


or more compactly as 

Mv = y 


where 


>r 


"l 

^1 


72 

, M — 

1 

x 2 

, V = 

7m 


1 




a) 


( 2 ) 


If the data points are not collinear, then it is impossible to find coefficients a and b that satisfy system 1 
exactly; that is, the system is inconsistent. In this case we will look for a least squares solution 



♦ * 

We call a line y = a + b x whose coefficients come from a least squares solution a regression line or a 


least squares straight line fit to the data. To explain this terminology, recall that a least squares solution of 1 
minimizes 


l|y-Mv|| (3) 

If we express the square of 3 in terms of components, we obtain 

||y — Mv|| 2 = Oi -a-bx\) 2 + O 2 -a-^*2) 2 + ••- + On-^-^w) 2 ( 4 ) 

If we now let 

d\ = \y\-a-bx\\, ^2 = \yi ~ a ~b*2\ . d n = ^ n -a — bx n \ 

then 4 can be written as 

||y-Mv|| 2 = ^ + J 2 2 + ... + ^ (5 ) 

As illustrated in Figure 6.5.2, the number d i can be interpreted as the vertical distance between the line 
y = a I bx and the data point y, ) • This distance is a measure of the “error” at the point (x lr y } ) 



resulting from the inexact fit of y = a \ bx to the data points, the assumption being that the Xj are known 
exactly and that all the error is in the measurement of the y 2 -. Since 3 and 5 are minimized by the same vector 
v *, the least squares straight line fit minimizes the sum of the squares of the estimated errors d^ hence the 
name least squares straight line fit. 



dj measures the vertical error in the least squares straight line. 


Normal Equations 

Recall from Theorem 6.4.2 that the least squares solutions of 1 can be obtained by solving the associated 
normal system 

M T Mv=M T y 

the equations of which are called the normal equations. 

In the exercises it will be shown that the column vectors of Mare linearly independent if and only if the n data 
points do not lie on a vertical line in the xy-plane. In this case it follows from Theorem 6.4.4 that the least 
squares solution is unique and is given by 

v = M^y 

In summary, we have the following theorem. 


Uniqueness of the Least Squares Solution 


Let (*i, yi), (*2, T2)> (*h> yn) be a set of two or more data points, not all lying on a vertical 

line, and let 



'l 

*1 


>r 

M = 

1 

*2 

and y = 

yi 


1 

x n 


yn 


Then there is a unique least squares straight line fit 

* 

y = a 


, * 

+ 6 x 


to the data points. Moreover, 











is given by the formula 


* 

v 


* 

a 


b 


* 


v*=(m t m) l M T y 

which expresses the fact that v = v* is the unique solution of the normal equations 

M T M\ = M T y 


( 6 ) 


(7) 


EXAMPLE 1 Least Squares Straight Line Fit 


Find the least squares straight line fit to the four points (0, 1), (1, 3),(2,4), and (3,4). (See 
Figure 6.5.3.) 



X 


Figure 6.5.3 


olution We have 



1 0 

1 1 

, M r M = 

'4 6 ' 

, and (M T M) 1 = 

7 -3" 


1 2 


_6 14 


-3 2 _ 


1 3 






v* = [M T M\ V r y = -V 



1 



7 -3' 

' 1111 ' 

3 

_ 

'1.5' 

\ / 10 

-3 2 _ 

.0 1 2 3_ 

4 

4 


_ 1 _ 


so the desired line is y = 1.5 + x- 



































EXAMPLE 2 Spring Constant 


Hooke's law in physics states that the length x of a uniform spring is a linear function of the 
force y applied to it. If we express this relationship as y = a | bx ■> then the coefficient b is 
called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1 
inches (i.e., x = 6.1 when y = 0). Forces of 2 pounds, 4 pounds, and 6 pounds are then applied 
to the spring, and the corresponding lengths are found to be 7.6 inches, 8.7 inches, and 10.4 
inches (see Figure 6.5.4). Find the spring constant. 



fb/vcy 


x i 


6.1 

0 

7.6 

2 

8.7 

4 

10.4 

6 


Figure 6.5.4 


We have 


and 


"l 

6.1" 


"o" 

i 

7.6 


2 

i 

8.7 

. y= 

4 

1 10.4 


6 



where the numerical values have been rounded to one decimal place. Thus, the estimated value 
of the spring constant is b sz 1.4 pounds/inch. 
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Source: NASA 

On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and 
transmitted thetemperature T in kelvins (K) versus the altitude h in kilometers (km) until its signal 
was lost at an altitude of about 34 km. Discounting theinitial erratic signal, the data strongly 
suggested a linear relationship, so a least squares straight line fit was used on the linear part of the 
data to obtain the equation 

T = 737.5 — 8.125/2 

By setting fo = Q in this equation, the surface temperature of Venus was estimated at 737.5K. 


! Temperature of Venusian 
: \ru .sphere 

Magellan orbit 3213 
: Dale: 5 October 1991 

Latitude: 67 N 
ITST: 22:05 




Least Squares Fit of a Polynomial 

The technique described for fitting a straight line to data points can be generalized to fitting a polynomial of 
specified degree to data points. Let us attempt to fit a polynomial of fixed degree m 

y =ao + <* 1 * +... + a m x m (8) 


to n points 


(*l>.yi)> (*2,72). (x n ,y n ) 

Substituting these n values of x and y into 8 yields the n equations 


71 

72 

yn 

or, in matrix form, 



+ 

<*1*1 

+ 

a m x l 

ao 

+ 

a\x 2 


„ m 

a m x 2 


+ 

a l x n 


„ w 

a m x n 


y = Mv 


(9) 


where 



( 10 ) 


>f 


1 

* 

>—*■ 

^ to 

x m 

*1 


~ a o ‘ 

72 

, M = 

^ - 

ff - 

r m 

*2 

, v = 

a 1 

y» 


1 X n Xyi ... 

* 

i_ 


a m 


As before, the solutions of the normal equations 

M T Mv = M T y 

determine the coefficients of the polynomial, and the vector v minimizes 

||y —Mv|| 

Conditions that guarantee the invertibility of are discussed in the exercises (Exercise 7). If is 

invertible, then the normal equations have a unique solution v = v \ which is given by 



EXAMPLE 3 Fitting a Quadratic Curve to Data 

According to Newton's second law of motion, a body near the Earth's surface falls vertically 
downward according to the equation 

s = so + vo* + ^ 2 (12) 


where 

s = vertical displacement downward relative to some fixed point 
= initial displacement at time t = 0 
v 0 = initial velocity at time t = 0 
g = acceleration of gravity at the Earth's surface 
from Equation 12 by releasing a weight with unknown initial displacement and velocity and 
measuring the distance it has fallen at certain times relative to a fixed reference point. Suppose 
that a laboratory experiment is performed to evaluate g. Suppose it is found that at times 
t = .1, .2, .3, .4, and .5 seconds the weight has fallen s = — 0.18, 0.31, 1.03, 2.48, and 3.73 
feet, respectively, from the reference point. Find an approximate value of g using these data. 

The mathematical problem is to fit a quadratic curve 

s = aQ + a\t + ct2t 2 ( 13 ) 


to the five data points: 

(.1,-0 18), (.2,0.31), (.3,1.03), (.4,2.48), (.5,3.73) 

With the appropriate adjustments in notation, the matrices M and y in 10 are 








M = 


1 1 1 tj 
1 t 2 t\ 
1 t\ 
1 ^4 
1 t 5 ti 


"l 

.1 

.of 


’sf 


'-0.18" 

1 

.2 

.04 


S2 


0.31 

1 

.3 

.09 

> y = 

S3 

= 

1.03 

1 

.4 

.16 


s 4 


2.48 

1 

.5 

.25 


S5 


3.73 


Thus, from 11, 


* 

V = 


a Q 

* 

a \ 

* 

a 2 


= 


-1 


M T y = 


-0.40 

0.35 

16.1 


1 


From 12 and 13, we have a 2 = f g, so the estimated value of g is 

g = 2^ = 2(16.1) = 32.2 feet / second 2 

If desired, we can also estimate the initial displacement and initial velocity of the weight: 


SO 

vo 


= <Xq = — 0.40 feet 
* 

= &\ = 0.35 feet/second 


In Figure 6.5.5 we have plotted the five data points and the approximating polynomial. 



Figure 6.5.5 


Concept Review 

Least squares straight line fit 

Regression line 

Least squares polynomial fit 


Skills 



























Find the least squares straight line fit to a set of data points. 
Find the least squares polynomial fit to a set of data points. 
Use the techniques of this section to solve applied problems. 


Exercise Set 6.5 

1. Find the least squares straight line fit to the three points (0, 0), (1, 2), and (2, 7). 

Answer: 

y=-± + l X 
y 2 2 

2. Find the least squares straight line fit to the four points (0, 1), (2, 0), (3, 1), and (3,2). 

3. Find the quadratic polynomial that best fits the four points (2, 0), (3, — 10), (5, —48),and(6, —76). 

Answer: 

y = 2 + 5x - 3x 2 

4. Find the cubic polynomial that best fits the five points ( — 1, — 14), (0, — 5), (1, — 4), (2, 1), and 
(3, 22). 

5. Show that the matrix M in Equation 2 has linearly independent columns if and only if at least two of the 

numbers x\, X 2 ,_, are distinct. 

6. Show that the columns of the n x {m + 1) matrix Min Equation 10 are linearly independent if n > m and 

at least m | 1 of the numbers x\, ---> x n are distinct. [Hint: A nonzero polynomial of degreem has at 

most m distinct roots.] 

7. Let M be the matrix in Equation 10. Using Exercise 6, show that a sufficient condition for the matrix 

M^M to be invertible is that n>m and that at least m + 1 of the numbers x\, •--> x n are distinct. 

8. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in 
thousands) are $4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures on a graph and conjectures 
that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least 
squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of 
the year. 

9. A corporation obtains the following data relating the number of sales representatives on its staff to annual 
sales: 


Number of 

Sales Representatives 

5 

10 

15 

20 

25 

30 

Annual Sales (millions) 

3.4 

4.3 

5.2 

6.1 

7.2 

8.3 


Explain how you might use least squares methods to estimate the annual sales with 45 representatives, and 
discuss the assumptions that you are making. (You need not perform the actual computations.) 










10. Pathfinder is an experimental, lightweight,remotely piloted,solar-powered aircraft that was used in aseries 
of experiments by NASA to determine the feasibilityof applyingsolar power for long-duration,high- 
altitude flight. In August 1997 Pathfinder recordedthe data in the accompanying table relating altitude PI 
and temperature T. Show that a linear model is reasonable by plotting the data, and then find theleast 
squares line H = Hq + k7 of best fit. 

Table Ex-10 


Altitude H 
(thousands of feet) 

15 

20 

25 

30 

35 

40 

45 

Temperature T 
(°C) 

4.5 

-5.9 

-16.1 

-27.6 

-39.8 

-50.2 

-62.9 


11. Find a curve of the form y = a I (b f x) that best fits the data points (1,7), (3, 3), (6, 1) by making the 
substitution X = 1 / x- Draw the curve and plot the data points in the same coordinate system. 


Answer: 




_ j t t i i i t i i i w 

I 10 

True-False Exercises 

In parts (a)-(d) determine whether the statement is true or false, and justify your answer. 

(a) Every set of data points has a unique least squares straight line fit. 

Answer: 

False 

(b) If the data points (jc l, ^ l), (*2> y 2)> /«) are not collinear, then 1 is an inconsistent system. 

Answer: 

True 

(c) If y = a + bx is the least squares line fit to the data points (x 1 , y \), (*2> J 2 ). ---> yn )> then 
dj = [y, — (a + bx ,) | is minimal for every 1 < i < n- 


Answer: 













False 


(d) If y = a I bx is the least squares line fit to the data points (*i, y i), ( 7 : 2 , y2), •••» y M ), then 

n 2 

ly> — (a -H bxj) r is minimal. 
i=l r 1 

Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



6.6 Function Approximation; Fourier Series 

In this section we will show orthogonal projections can be used to approximate certain types of functions by 
simpler functions that are easier to work with. The ideas explained here have important applications in 
engineering and science. Calculus is required. 


Best Approximations 

All of the problems that we will study in this section will be special cases of the following general problem. 


APPROXIMATION PROBLEM 

Given a function/that is continuous on an interval [a, b ], find the “best possible approximation” to/ 
using only functions from a specified subspace W of C[a, b]. 


Here are some examples of such problems: 

Find the best possible approximation to e x over [0, 1 ] by a polynomial of the form | a ^ x | 

Find the best possible approximation to sin~x over [ — 1, 1 ] by a function of the form 
«0 4- <*\e x 4- tt2<P‘ x 4- ays^ x - 

Find the best possible approximation to x over [ 0, 2 tt] by a function of the form 
«o 4- tatisin x -F <Z 2 sin 2x 4- &icos x + &2 C0S 2x. 

In the first example W is the subspace of C[0, 1] spanned by 1, x, and x ^; in the second example W is the 
subspace of C[ — 1, 1 ] spanned by 1, e x , and e ^ x ; and in the third example W is the subspace of 
C[0, 2x] spanned by 1, sin x, S m 2x, cos x, and C os 2x- 


Measurements of Error 

To solve approximation problems of the preceding types, we first need to make the phrase “best 
approximation over [a, b] ” mathematically precise. To do this we will need some way of quantifying the 
error that results when one continuous function is approximated by another over an interval [a, b] . If we 
were to approximate / (x) by g(x), and if we were concerned only with the error in that approximation at a 
single point xq, then it would be natural to define the error to be 

error = |/(x 0 ) -g(*o)| 

sometimes called the deviation between/and g at *0 (Figure 6.6.1). However, we are not concerned simply 
with measuring the error at a single point but rather with measuring it over the entire interval [a, b ] . The 
problem is that an approximation may have small deviations in one part of the interval and large deviations in 
another. One possible way of accounting for this is to integrate the deviation |/ (x) — g(x) | over the interval 
[a, b ] and define the error over the interval to be 


error = 


jru*)-g(*) 


dx 


( 1 ) 


Geometrically, 1 is the area between the graphs of / (x) and g(x) over the interval [a, b ] (Figure 6.6.2); the 
greater the area, the greater the overall error. 



The deviation between/and g xq 


+ 

a 


it 


f 


I 


+ 

b 


The area between the graphs of f and g over [a, b ] measures the error in approximating/ 
by g over [a, b] 


Although 1 is natural and appealing geometrically, most mathematicians and scientists generally favor the 
following alternative measure of error, called the mean square error. 


mean square error = 



[/«-*(*)] 2 


dx 


Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage 
that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous 
function on [a, b] that we want to approximate by a function g from a subspace W of C[a, b ], and suppose 
that C[a, b] is given the inner product 



f(x)g(x) dx 


It follows that 


^ — ,r f — g'| , = / [/(x) — g(x)] 2 dx = mean square error 


iif-gir=(f-g 


J a 


so minimizing the mean square error is the same as minimizing ||f — g|| . Thus the approximation problem 


posed informally at the beginning of this section can be restated more precisely as follows. 


Least Squares Approximation 







LEAST SQUARES APPROXIMATION PROBLEM 


n 


Let f be a function that is continuous on an interval [a, b ], let C[a, b] have the inner product 



/(x)g(x) dx 


and let IFbe a finite-dimensional subspace of C[a, b ]. Find a function g in W that minimizes 


iif-gn 2 



[/«-««] 2 


dx 


L J 

Since iif-gir and Ilf - all are minimized by the same function g, this problem is equivalent to looking for a 

function g in W that is closest to f. But we know from Theorem 6.4.1 that g = projw f is such a function 
(Figure 6.6.3). 

T = function in C[a> b] 
to be approximated 


r 

g = proj w ,f = least squares 

^ approximation 

subspace of to f from W 

approximating 
functions 

Figure 6.6.3 

Thus, we have the following result. 


THEOREM 6.6.1 

If f is a continuous function on [a, b ], and IF is a finite-dimensional subspace of C[a, b ], then the 
function g in W that minimizes the mean square error 


f 


[/(x)-g(x) Ydx 


is g = proj^f, where the orthogonal projection is relative to the inner product 

(f, f(x)g(x) dx 

The function g = proj^ f is called the last squares approximation to f from W. 


Fourier Series 


A function of the form 

T(x) =cq + cicos x +C 2 COS 2x 4- • • • + c M cos nx + dfisinx 4- ^ 2 s in 2x + • • • +i 2 ? M sin«x (2) 

is called a trigonometric polynomial, if c n and d n are not both zero, then T(x) is said to have order n. For 
example, 

T(x) = 2 + cos x — 3 cos 2x + 7 sin Ax 
is a trigonometric polynomial of order 4 with 

co = 2,C\ = 1, C2 = - 3, C 3 = 0, c 4 = 0, = 0, ^2 = 0- ^3 = 0, ^4 = 7 

It is evident from 2 that the trigonometric polynomials of order n or less are the various possible linear 
combinations of 


1, cost:, cos2x,..., cos nx, sinx, sin 2x,sin nx (3) 

It can be shown that these 2 n | 1 functions are linearly independent and thus form a basis for a (2n + 1) 
-dimensional subspace of C[a, b ]. 

Let us now consider the problem of finding the least squares approximation of a continuous function / (x) 
over the interval [0, 2 tt] by a trigonometric polynomial of order n or less. As noted above, the least squares 
approximation to f from W is the orthogonal projection of f on W. To find this orthogonal projection, we must 
find an orthonormal basis gg, g\, g2n f° r ^ a ft er which we can compute the orthogonal projection on W 
from the formula 


projfpf = (f, g 0 }go + (f, gl}gl + • • • +(f, g2w}g2* 


(4) 


(see Theorem 6.3.46). An orthonormal basis for W can be obtained by applying the Gram-Schmidt process to 
the basis vectors in 3 using the inner product 

-2?r 

(f, g}=/ / (x)g(x)dx 

J 0 


This yields the orthonormal basis 

gQ = 


1 , 21 = -4= cos x . g„ = -4= cos nx. 


][2x /jr 




gn+1 = -L sin x,.... g2« = - 7 = sin nx 

\j 7T 


(see Exercise 6). If we introduce the notation 


(5) 



( 6 ) 


^0 f — jf? SO L ^1 j —jf> 61 L —> &yi i —jf > 8 n ] 

\ 2k \ | y k | | y k | 

^1 = j — » Sm+1 1* •••> t —jf * 82 n 

y7T | | y7T | 

then on substituting 5 in 4, we obtain 

proj^f = + [ajcos x + • • • + a M cos «*] + [£isin * + • • • +&„sin«x] 


where 


<»0 =- 7 ^= f. go =- 7 =[ f(x)-F=dx = ^f f(x)dx 

{2k\ I {2k Jo {2 k Jo 

1 I 1 i fi* i i 

a\ = -^(f, Si I = —j= I f {x)—j= cos x dx = - I /(x)cosxdx 

j/;rl I \jxJo y k “Jo 


i 1 i f" i i 

n = —j= f, g„ = —j= I f (x)—j= cos nx dx = — / / (x) cos nx dx 

Y?r| | \jkJo y k “Jo 

1 1 1 1 f** 

g«+l =-/=/ f(x)-i= sm xdx = - f(x)smxdx 
| ifirio yfir -/0 

1 /^~ 1 1 

= —=l fix )—sin nx dx = — / / (x) sin 

fiJo JK ’ Vo 


t>\- r 

\K 


— /— [ f * 62 m 

y at 


In short. 


a* 


«2?r «2?r 

= — / /(x)cos /for rix, bfr = — j /(x)srn kx dx 

*J o 57 7o 


The numbers 3 q, dJj, a n , b\, ...,b n are called the Fourier coefficients of f. 

EXAMPLE 1 Least Squares Approximations 

Find the least squares approximation of / (x) = x on [0, 2 tt] by 
a trigonometric polynomial of order 2 or less; 
a trigonometric polynomial of order n or less. 


(7) 


( 8 ) 


Solution 

(a) 


a o 


= i/ 2 ’/w*= 1 A 

Vo Vo 


x dx = 2k 


(9a) 


For k = 1, 2 ,integration by parts yields (verify) 









a* 


1 f 2?r 1 f 2 * 

— — f (x) cos kx dx = — / x cos for dx = 0 

*h *Jo 


(9b) 


bk 


if 2 ’ 

*Jo 


-1 ■ 


•2tt 


f (x)sin kx dx — — f x sin kx dx = — — 
*J0 k 


(9c) 


Thus, the least squares approximation to x on [0, 2x] by a trigonometric polynomial of 
order 2 or less is 

x ss 4-tficos x + A 2 COS 2x 4-6isinx 4-&2 sin 2x 


or, from (9a), (9b), and (9c), 


x «7T — 2 sin x — sm 2x 


The least squares approximation to x on [0,2^] by a trigonometric polynomial of order n 
or less is 

x ^p- 4- [«icosx+ • • • + < 2 „cos«x] + [&isinx + • • • +i>„sm«x] 

or, from (9a), (9b), and (9c), 

_ -> _ , sin 2x , sin 3x , , sin nx 

xpstt — 2lsmx + ——— += —-— + • • • + 


2 3’ n 

The graphs of y = x and some of these approximations are shown in Figure 6.6.4. 



Figure 6.6.4 


It is natural to expect that the mean square error will diminish as the number of terms in the 
least squares approximation 

an n 

/ ( x ) se + 5Z (tffccos kx 4- i^sin kx) 

z k =1 


increases. It can be proved that for functions/in C[0, 2x], the mean square error 
approaches zero as « _► q- ooi this is denoted by writing 

□Q 

/ (x) = + 5Z (afccos kx 4- ifcsin kx) 

1 k =1 








The right side of this equation is called the Fourier series for/over the interval [0, 2 tt] . 
Such series are of major importance in engineering, science, and mathematics. 



Jean Baptiste Fourier (1768-1830) 

Fourier was a French mathematician and physicist who discovered 
the Fourier series and related ideas while working on problems of heat diffusion. This 
discovery was one of the most influential in the history of mathematics; it is the 
cornerstone of many fields of mathematical research and a basic tool in many branches 
of engineering. Fourier, a political activist during the French revolution, spent time in 
jail for his defense of many victims during the Terror. He later became a favorite of 
Napoleon and was named a baron. 

[Image: The Granger Collection, New York ] 


Concept Review 

Approximation of functions 
Mean square error 
Least squares approximation 
Trigonometric polynomial 
Fourier coefficients 
Fourier series 

Skills 

Find the least squares approximation of a function. 

Find the mean square error of the least squares approximation of a function. 
Compute the Fourier series of a function. 


Exercise Set 6.6 


1. Find the least squares approximation of / (x) = 1 4- x over the interval [0, 2 tt] by 

(a) a trigonometric polynomial of order 2 or less. 

(b) a trigonometric polynomial of order n or less. 

Answer: 

(a) (1 + ff) — 2 sin x — sin 2x 

(b) (l + .)_2r s in l4 

2 D ^ 

2. Find the least squares approximation of / (x) = x over the interval [0, 2x] by 

(a) a trigonometric polynomial of order 3 or less. 

(b) a trigonometric polynomial of order n or less. 

3* (a) Find the least squares approximation of x over the interval [0, 1 ] by a function of the form a + be*. 
(b) Find the mean square error of the approximation. 


Answer: 


(a) 

(b) 


13 ■ l + « 

12 2(1 -e) 


(a) Find the least squares approximation of e* over the interval [0, 1 ] by a polynomial of the form 
aQ+a\x. 

(b) Find the mean square error of the approximation. 

(a) Find the least squares approximation of sin xx over the interval [-1, 1] by a polynomial of the form 

aQ+a\x +a2* 2 - 

(b) Find the mean square error of the approximation. 


Answer: 

(a) 2-x 

7T 

6. Use the Gram-Schmidt process to obtain the orthonormal basis 5 from the basis 3. 

7. Carry out the integrations indicated in Formulas 9a ? 9b, and 9c. 

8. Find the Fourier series of / ( x ) = tt — x over the interval [0, 2 tt] . 








9. Find the Fourier series of / (x) = 1, 0 < x < tr and / (x) = 0, jt < x < over the interval [0, 2ir]. 


Answer: 




10. What is the Fourier series of sin(3x)? 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) If a function f in C[a, b] is approximated by the function g, then the mean square error is the same as the 
area between the graphs of / (x) and g(x) over the interval [a,b]. 

Answer: 

False 

(b) Given a finite-dimensional subspace WofC[a,b], the function g = projn f minimizes the mean square 
error. 

Answer: 

True 

(c) {1, cos*, sinx, cos2x, sin2x) is an orthogonal subset of the vector space C[0, 2 tt] with respect to the 



inner 


Answer: 


True 


(d) {1, cosx, siiu, cos2x, sin2x} is an orthonormal subset of the vector space C[0, 2 jt] with respect to the 


inner 



Answer: 


False 

(e) { 1 , cos*, sinx, cos2x, sin2x) is a linearly independent subset of C[0, 2x]. 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Supplementary Exercises 


1. Let R 4 have the Euclidean inner product. 

(a) Find a vector in R 4 that is orthogonal to ui = (1, 0, 0, 0) and U 4 = (0, 0, 0, 1) and makes equal 
angles with U 2 = ( 0 , 1 , 0 , 0 ) and 113 = ( 0 , 0 , 1 , 0 ). 

(b) Find a vector x = (xq, xj, * 3 , * 4 ) of length 1 that is orthogonal to u^ and U 4 above and such that the 
cosine of the angle between x and U 2 is twice the cosine of the angle between x and U 3 . 

Answer: 


(a) ( 0 , a, a, 0 ) with a * 0 


2. Prove: If (u, vj is the Euclidean inner product on R } \ and if A is an ^ x n matrix, then 

ju, Av''j = r u, vj 

[Hint: Use the fact that ju, vj = u - v = v T u ] 

Fet M 22 have the inner product j V ^ J = trju^V 'j — tr(V' U \ that was defined in Example 6 of 

Section 6 .1 . Describe the orthogonal complement of 

(a) the subspace of all diagonal matrices. 

(b) the subspace of symmetric matrices. 


Answer: 


(a) The subspace of all matrices in M 22 with on ly zeros on the diagonal. 

(b) The subspace of all skew-symmetric matrices in il^ 22 - 

4. Let = 0 be a system of m equations in n unknowns. Show that 

■*r 

*2 

x= . 

is a solution of this system if and only if the vector x = (xj, xj, ..., x n ) is orthogonal to every row vector 
of A with respect to the Euclidean inner product on R n . 

5. Use the Cauchy-Schwarz inequality to show that if a\, < 22 , ...,a n are positive real numbers, then 

(tfl+fl 2 + • • • +^)(^f + ^+ ' ' ' + ^)-" 2 

6 . Show that if x and y are vectors in an inner product space and c is any scalar, then 




Ilex4-yll 2 = , 2 ||x|| 2 + 2c|x, yj + ||y|| 2 

7. Let R-' have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of 
the vectors ui = (1, 1, — 1), U2 = ( — 2, — 1, 2), and 113 =(—1,0, 1). 

Answer: 




8. Find a weighted Euclidean inner product on R n such that the vectors 

vi = (1, 0, 0,.... 0) 

v 2 = (0,^0,...,0) 

v 3 = (O, 0, { 2 ,..., O) 

v„ = (0, 0, 0. {n } 


form an orthonormal set. 

9. Is there a weighted Euclidean inner product on R 2 for which the vectors (1,2) and (3, — 1) form an 
orthonormal set? Justify your answer. 


Answer: 


No 

10. If u and v are vectors in an inner product space Y - then u, v, and u — v can he regarded as sides of a 
“triangle” in V (see the accompanying figure). Prove that the law of cosines holds for any such triangle; 
that is, 

ll„ —v l| 2 = l|u|| 2 + ||v|| 2 — 2||u||||v||cos 6 
where 0 is the angle between u and v. 


u 

Figure Ex-10 

(a) As shown in Figure 3.2.6, the vectors {k, 0, 0), (0, k, 0), and (0, 0, k) form the edges of a cube in 
with diagonal ( k , k, k ). Similarly, the vectors 

(*, 0 , 0 ,..., 0 ), ( 0 , k, 0 . 0 ),..., ( 0 , 0 , 0 ,..,*) 

can be regarded as edges of a “cube” in R n with diagonal (k, k, k,k). Show that each of the above 
edges makes an angle of 0 with the diagonal, where cos 9 = 1 / \fn- 

(b) Calculus required What happens to the angle 0 inpart (a) as the dimension of R n approaches -|-oo? 


Answer: 


(b) 0 approaches 

12. Let u and v be vectors in an inner product space. 

(a) Prove that ||u|| = ||v|| if and only if u | v and u — v are orthogonal. 

(b) Give a geometric interpretation of this result in R 2 with the Euclidean inner product. 

13. Let u be a vector in an inner product space V, and let (vi, V 2 ,..., v„} be an orthonormal basis for V. 
Show that if a,- is the angle between u and v 2 , then 

cos oq + cos £*2 + ’ ' ' + cos a n = 1 

14. Prove: If (u, and (u, vV-, are two inner products on a vector space V] then the quantity 
(u,v} = (u,v} 1 +(u,v} 2 is also an inner product. 

15. Prove Theorem 6.2.5. 

16. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then 
the least squares solution of j^x. = b is x = 0- 

17. Is there any value of s for which = 1 and X2 = 2 is the leastsquares solution of the following linear 

system? 

*1 “ x 2 = 1 

2 x\ + 3*2 = 1 

4x\ + 5x2 = s 

Explain your reasoning. 

Answer: 


18. Show that if p and q are distinct positive integers, then the functions / (x) = sin px and g(x) = sin qx are 
orthogonal with respect to the inner product 

f 2 * 

( f ,g} = / 0 f(x)g(x)dx 


19. Show that if p and q are positive integers, then the functions f (x) = cos px and g(x) = sin qx are 
orthogonal with respect to the inner product 


i-2 rr 

( f , g } = y o f{x)g{x)dx 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 
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Quadratic Forms 


CHAPTER CONTENTS 

Orthogonal Matrices 
Orthogonal Diagonalization 
Quadratic Forms 

Optimization Using Quadratic Forms 
Hermitian, Unitary, and Normal Matrices 


INTRODUCTION 

In Section 5.2 we found conditions that guaranteed the diagonalizability of an n x « 
matrix, but we did not consider what class or classes of matrices might actually satisfy 
those conditions. In this chapter we will show that every symmetric matrix is 
diagonalizable. This is an extremely important result because many applications utilize it 
in some essential way. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.1 Orthogonal Matrices 

In this section we will discuss the class of matrices whose inverses can be obtained by transposition. Such matrices occur in a variety of 
applications and arise as well as transition matrices when one orthonormal basis is changed to another. 


Orthogonal Matrices 

We begin with the following definition. 

r n 


DEFINITION 1 

A square matrix A is said to be orthogonal if its transpose is the same as its inverse, that is, if 

A~ l =A t 

or, equivalently, if 


aa t =a t a=i 


( 1 ) 


L 


Recall from Theorem 1.6.3 that if either product in 1 holds, then 
so does the other. Thus, A is orthogonal if either J{A ^ = / or 

a t a=l 


EXAMPLE 1 A3 x 3 Orthogonal Matrix 


The matrix 


is orthogonal since 


A = 


3 2. 
7 7 

2 2 

7 7 

2 6 

7 7 


6 

7 

2 

7 

3 

7 




2 

7 

1 

7 

6 

7 


6 

7 

2 

7 

3 

7 


1 0 0 
0 1 0 
0 0 1 


J 


EXAMPLE 2 Rotation and Reflection Matrices are Orthogonal 


Recall from Table Table 5 of Section 4.9 that the standard matrix for the counterclockwise rotation of R 1 through an angle 0 is 


A = 


cos 9 
sin 9 


—sin 6 
cos 0 


This matrix is orthogonal for all choices of 0 since 


cos 9 sin# 

cos 9 

—sin 9 


'1 

O' 

—sin 9 cos 9 

sin 9 

cos 9 


_0 

1_ 


We leave it for you to verify that the reflection matrices in Tables Table 1 and Table 2 and the rotation matrices in Table Table 6 of 
Section 4.9 are all orthogonal. 


















Observe that for the orthogonal matrices in Example 1 and Example 2, both the row vectors and the column vectors form orthonormal sets with 
respect to the Euclidean inner product. This is a consequence of the following theorem. 


THEOREM 7.1.1 

The following are equivalent for an n x n matrix A. 

(a) A is orthogonal 

(b) The row vectors of A form an orthonormal set in R n with the Euclidean inner product. 

(c) The column vectors of A form an orthonormal set in R n with the Euclidean inner product. 

We will prove the equivalence of (a) and ( b ) and leave the equivalence of (a) and (c) as an exercise. 

(a) « (b) The entry in the zth row and /th column of the matrix product AA ^ is the dot product of the z'th row vector of A and the /th column 
vector of A T (see Formula 5 of Section 1.3). But except for a difference in form, the y'th column vector of A ^ is the y'th row vector of A. Thus, if the 
row vectors of A are r\ , r 2 ,..r M , then the matrix product AA ^ can be expressed as 



r l ' 

1 r i 

r l 

1 r 2 - 

- r l 

aa t = 

r 2 ' 

’ r i 

r 2 ' 

' r 2 - 

to 

•-« 

a 



•ri 


' r 2 - 

-- 


[see Formula 28 of Section 3.2]. Thus, it follows that AA T = 1 if and only if 

r l Ti=r 2 *r 2 = ... = r„-r„= 1 


and 


r 2 • r j = 0 when i * j 

which are true if and only if {rj, r 2 .r M } is an orthonormal set in R n . 


WARNING 

Note that an orthogonal matrix is one with orthonormal rows and columns—not simply orthogonal rows and columns. 


The following theorem lists three more fundamental properties of orthogonal matrices. The proofs are all straightforward and are left as exercises. 


THEOREM 7.1.2 

(a) The inverse of an orthogonal matrix is orthogonal. 

(b) A product of orthogonal matrices is orthogonal. 

(c) If A is orthogonal, then det(j4) = 1 or det(-i4) = — 1. 

EXAMPLE 3 det(A) = ±1 for an Orthogonal Matrix A 

J_ J_" 

\[i {2. 

j_ j_ 

{2 {2 


The matrix 






is orthogonal since its row (and column) vectors form orthonormal sets in r} with the Euclidean inner product. We leave it for you 
to verify that det(-4) = 1 and that interchanging the rows produces an orthogonal matrix whose determinant is — 


Orthogonal Matrices as Linear Operators 

We observed in Example 2 that the standard matrices for the basic reflection and rotation operators on p' 1 and p} are orthogonal. The next theorem 
will explain why this is so. 


THEOREM 7.1.3 

If A is an ^ x n matrix, then the following are equivalent. 

(a) A is orthogonal. 

(b) ||Ar|| = ||x|| for all x in R n . 

(c) Ax. ■ Ay = x • y for all x and y in R n . 


We will prove the sequence of implications (a) => ( b ) => (c) => (a). 

Assume that A is orthogonal, so that A^A = /• It follows from Formula 26 of Section 3.2 that 

ll^xll = (Ax • Ax) 1/2 = (x • A T Ax'j U2 = (x • x) 1/2 = ||x|| 

Assume that ||^x|| = ||x|| for all x in R n . From Theorem 3.2.7 we have 

Ax-Ay = l||^x + J 4y|| 2 -I||^x- J 4y|| 2 =I||^(x + yj|| 2 -l||^(x-yJ|| 2 

= }ll* + y|| 2 -i||x-y|| 2 = X -y 

Assume that Ax • Ay = x • y for all x and y in R n . It follows from Formula 26 of Section 3.2 that 

x-y = x- A T Ay 

which can be rewritten as x • {A?Ay — yj = 0 or as 

x-(A T A-l )y = 0 

Since this equation holds for all x in R n , it holds in particular if x = {a^A — /W so 

(A T A-l)y(A T A-l )y = 0 

Thus, it follows from the positivity axiom for inner products that 

(^-/)y = 0 

Since this equation is satisfied by every vector y in R n , it must be that A 1 A — I is the zero matrix (why?) and hence that A^A = I- Thus, A is 
orthogonal. 


Theorem 7.1.3 has a useful geometric interpretation when considered from the viewpoint of matrix transformations: If A is an orthogonal matrix 
and Tj^.R n R n is multiplication by A, then we will call Tj\ an orthogonal operator on R n . It follows from parts (a) and ( b ) of Theorem 7.1.3 
that the orthogonal operators on R n are precisely those operators that leave the lengths of all vectors unchanged. This explains why, in Example 2, 
we found the standard matrices for the basic reflections and rotations of R^ and R^ to be orthogonal. 

Parts (a) and (c) of Theorem 7.1.3 imply that orthogonal 
operators leave the angle between two vectors unchanged. Why? 


Change of Orthonormal Basis 


Orthonormal bases for inner product spaces are convenient because, as the following theorem shows, many familiar formulas hold for such bases. 
We leave the proof as an exercise. 


THEOREM 7.1.4 


If S is an orthonormal basis for an ^-dimensional inner product space V, and if 

(u)£-=(«!, U 2 . u„) and (v) 5 .= (vi,v 2 .v„) 


then: 

( a > Ml = l/n?+« 2 + ■ ■ ■ + u% 

(b) rf(n.v)=if( B1 — vl ) 2 + ( U 2-v 2 ) 2 + • • • + K-V *) 2 

(c) (u, v} = uivi + « 2 v 2 + • • • +u„v„ 


Note that the three parts of Theorem 7.1.4 can be expressed as 

INI = 1100 dl <*(*. v) = d((u),y, (v) S ) (U, V} = {(u) S , (v)j} 

where the norm, distance, and inner product on the left sides are relative to the inner product on V and on the right sides are relative to the 
Euclidean inner product on R n . 


Transitions between orthonormal bases for an inner product space are of special importance in geometry and various applications. The following 
theorem, whose proof is deferred to the end of this section, is concerned with transitions of this type. 


THEOREM 7.1.5 

Let Kbe a finite-dimensional inner product space. If P is the transition matrix from one orthonormal basis for V to another orthonormal 
basis for V, then P is an orthogonal matrix. 


EXAMPLE 4 Rotation of Axes in 2-Space 


In many problems a rectangular xy-coordinate system is given, and a new x ' y ' -coordinate system is obtained by rotating the 
xy -system counterclockwise about the origin through an angle 0. When this is done, each point Q in the plane has two sets of 
coordinates—coordinates (*, y) relative to the xy-system and coordinates (* > y ) relative to the x^-system (Figure 7.1.1a). 



A u 2 




1 


ay 


1 

1 

1 

1 

•♦f 

1 

Jt 

\6 * 




(«) 


(*) 


Figure 7.1.1 


(c) 


id) 


By introducing unit vectors uj and U 2 along the positive x- andy-axes and unit vectors uj and u-, along the positive x r - and y^-axes, 
we can regard this rotation as a change from an old basis B = (uj, U 2 } to a new basis — | u i > u 2 } (Figure 7.1. lb). Thus, the nev 
coordinates ( x *> y*) and the old coordinates (x, y) of a point Q will be related by 


-i;] 


( 2 ) 


















where P is the transition from B' to B. To find P we must determine the coordinate matrices of the new basis vectors Uj and u-, 
relative to the old basis. As indicated in Figure 7.1.1c, the components of u'j in the old basis are cos 0 and sin 0, so 

r / -1 ["COS 0 1 

Similarly, from Figure 7.1. Id, we see that the components of u 7 in the old basis are cos (0 4- x / 2) = — sin 0 and 
sin(0 + 7T / 2) = cos 0, so 

—sin 0 




cos 0 


Thus the transition matrix from B' to B is 


P = 


cos0 —sin 0 
sin 0 cos 0 


Observe that P is an orthogonal matrix, as expected, since B and B' are orthonormal bases. Thus 

cos 0 sin# 

—sin 0 cos0 


p~ 1 =p t = 


so 2 yields 


cos 0 sin ( 
—sin 0 cos I 


X] 


or, equivalently, 

x 9 = xcos0+.ysin0 
y r = —x sin 0+7 cos 0 

These are sometimes called the rotation equations for £ 2 . 


( 3 ) 


(4) 


(5) 


EXAMPLE 5 Rotation of Axes in 2-Space 

Use form 4 of the rotation equations for to find the new coordinates of the point Q( 2, 1) if the coordinate axes of a rectangular 
coordinate system are rotated through an angle of 9 = tt / 4- 


Solution Since 


the equation in 4 becomes 


sin-j = cos-7 = -\= 
4 4 


{2 {2 
1 1 
\[2 {2 




Thus, if the old coordinates of a point Q are y) = (2, — 1), then 


y 


so the new coordinates of Q are y 9 1 = ‘— 





1 

1/2 {2 

2 ' 


f2 

_1_ J_ 

{2 {2 

_-l_ 


3 

~f2_ 


Observe that the coefficient matrix in 4 is the same as the standard matrix for the linear operator that rotates the vectors of through 
the angle ~^0 (see margin note for Table 5 of Section 4.9). This is to be expected since rotating the coordinate axes through the angle 0 with the 
vectors of p} kept fixed has the same effect as rotating the vectors in p} through the angle —0 with the axes kept fixed. 























EXAMPLE 6 Application to Rotation of Axes in 3-Space 


Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counterclockwise (looking down the positive z-axis) 
through an angle 0 (Figure 7.1.2). If we introduce unit vectors ui, 112, and 113 along the positive x-, y-, and z-axes and unit vectors uj, 
u^, and U3 along the positive x f ~, y'~, and z'-axes, we can regard the rotation as a change from the old basis B = {u\, 112,113} to the 
new basis B* = =j uj, u^, U 3 j>. i n light of Example 4, it should be evident that 


Moreover, since 113 extends 1 unit up the positive z'-axis, 


cos 6 
sin 0 
0 


and [i4] J 


[»3h = 


—sin 9 
cos 9 
0 




y 


Figure 7.1.2 


It follows that the transition matrix from B’ to B is 


P = 


cos 9 
sin 9 
0 


—sin 9 0 
cos 9 0 
0 1 


and the transition matrix from B to B' is 


P" 1 


cos 9 sin 9 0 
—sin 9 cos 9 0 
0 0 1 


(verify). Thus, the new coordinates ft*, y* > z *) of a point Q can be computed from its old coordinates (*, y r z) by 


x 

y 


f 

f 


z 


f 


cos 9 sin# 0 
—sin 9 cos 9 0 
0 0 1 


x 

y 

z 


OPTIONAL 

We conclude this section with an optional proof of Theorem 7.1.5. 

Assume that V is an ^-dimensional inner product space and that P is the transition matrix from an orthonormal basis 
B' to an orthonormal basis B. We will denote the norm relative to the inner product on Vby the symbol || || y to distinguish it from the norm 
relative to the Euclidean inner product on R n , which we will denote by || ||. 


Recall that (u) ^ denotes a coordinate vector expressed in 
comma-delimited form whereas [u] £ denotes a coordinate vector 
expressed in column form. 

To prove that P is orthogonal, we will use Theorem 7.1.3 and show that ||Px|| = ||x|| for every vector x in R n . As a first step in this direction, 
recall from Theorem 7.1.4a that for any orthonormal basis for V the norm of any vector u in Kis the same as the norm of its coordinate vector with 
respect to the Euclidean inner product, that is 




















or 


l|u||r=ll[u]s'll = ll[u]sll 


INI r= II [u]*'ll = Hauls'll 


( 6 ) 


Now let x be any vector in R n , and let u be the vector in V whose coordinate vector with respect to the basis B' is x; that is, [u] g* = x. Thus, from 

6 , 

||u|| = ||x|| = ||^|| 


which proves that P is orthogonal. 


Concept Review 

Orthogonal matrix 
Orthogonal operator 
Properties of orthogonal matrices. 

Geometric properties of an orthogonal operator 

Properties of transition matrices from one orthonormal basis to another. 

Skills 

Be able to identify an orthogonal matrix. 

Know the possible values for the determinant of an orthogonal matrix. 
Find the new coordinates of a point resulting from a rotation of axes. 


Exercise Set 7.1 

(a) Show that the matrix 



4 

0 

3 

5 

5 

9 

4 

12 

25 

5 

25 

12 

3 

16 

25 

5 

25 


is orthogonal in three ways: by calculating A ^ 4 , by using part (b) of Theorem 7.1.1, and by using part (c) of Theorem 7.1.1. 
(b) Find the inverse of the matrix A in part (a). 


Answer: 


(b) 


4 

5 

0 

3 

5 


9 

12 

25 

25 

4 

3 

5 

5 

12 

16 

25 

25 


(a) Show that the matrix 


1 2 2 
3 3 3 

2 _2 1 

3 3 3 

2 _i 2 

3 3 3 


is orthogonal. 








(b) Let X\P? —► be multiplication by the matrix A in part (a). Find T(x) for the vector x = ( — 2, 3, 5) 

on pp, verify that ||T(x) || = ||x||. 

3. Determine which of the following matrices are orthogonal. For those that are orthogonal, find the inverse. 


(d) 


(e) 


(f> 


(a) 

'1 O' 



.0 


(b) 

" 1 

1 


f2 

~f2 


1 

1 


f2 

f2 

(c) 

0 1 

1 


h 


1 0 

0 


0 0 

1 


f2 


1 1 1 

{2 /6 {3 

_ 2 _ 1 

Ve {3 

1 1 1 

1/2/61/3 


0 


1 

2 

1 

2 

1 

2 

1 

2 

1 0 


1 

2 

1 

6 

1 

6 

5 

"6 

0 0 


0 -4= 4 ° 


ft 


0 -j= 


f3 

f3 


0 1 


0 -L I 0 


Answer: 


(a) 

(b) 

(d) 


1 0 
0 1 

{2 {i 

_1_ J_ 

{2 {2. 


,J_ 0 -L 

f2 f2 

J__2_ J_ 

1 _1 _ 1 _ 

{3 {3 {3 


. Using the Euclidean inner product 



(e) 


1 

2 

1 

2 

1 

2 

1 

2 


1 

2 

5 

6 

1 

6 

1 

6 


1 

2 

1 

6 

I 

6 

5 

6 


1 

2 

1 

6 

5 

6 

1 

6 


4. Prove that if A is orthogonal, then ^ is orthogonal. 

5. Verify that the reflection matrices in Tables Table 1 and Table 2 of Section 4.9 are orthogonal. 

6 . Let a rectangular x f y f -coordinate system be obtained by rotating a rectangular xy-coordinate system counterclockwise through the angle 
0 = 3 tt/4. 

(a) Find the x r y r -coordinates of the point whose xy-coordinates are ( — 2, 6 ). 

(b) Find the xy-coordinates of the point whose x r y r -coordinates are (5, 2). 


7. Repeat Exercise 6 with Q = n f 3- 


Answer: 


(a) (-1 + 3/3, 3+^3) 

(b >(§-/3, f/5+i) 

8 . Let a rectangular x*y*z' -coordinate system be obtained by rotating a rectangular xyz-coordinate system counterclockwise about the z-axis 
(looking down the z-axis) through the angle 0 = tt / 4 - 

(a) Find the x r y r z r -coordinates of the point whose xyz-coordinates are ( — 1, 2, 5). 

(b) Find the xyz-coordinates of the point whose x'y'z' -coordinates are (1, 6 , — 3). 

9. Repeat Exercise 8 for a rotation of 0 = ^ / 3 counterclockwise about they-axis (looking along the positive y-axis toward the origin). 

Answer: 



10. Repeat Exercise 8 for a rotation of 0 = 3 ^ / 4 counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 

(a) A rectangular x r y r z r -coordinate system is obtained by rotating an xyz-coordinate system counterclockwise about they-axis through an 
angle 0 (looking along the positive y-axis toward the origin). Find a matrix^ such that 

where (x 9 y r z) and y* z ') are the coordinates of the same point in the xyz- and x'yV-systems, respectively. 

(b) Repeat part (a) for a rotation about the x-axis. 



Answer: 


(a) 

cos 9 0 —sin 0 

A = 

0 1 0 


sin# 0 cos 0 

(b) 

0 

0 

A = 

0 cos 9 sin 6 


0 —sin 0 cos 9 


12. A rectangular x ,l y n z ! '-coordinate system is obtained by first rotating a rectangular xyz-coordinate system 60° counterclockwise about the 
z-axis (looking down the positive z-axis) to obtain an x'y r z f -coordinate system, and then rotating the x r y r z r -coordinate system 45° 



counterclockwise about the y r -axis (looking along the positive y^-axis toward the origin). Find a matrix^ such that 


J7 " 



X 


'x~ 

/' 

= A 

y 

// 


z 

z 




where (x,y,z) and (* 99 , y 99 , ) are the xyz- and x "y n z n - coordinates of the same point. 

13. What conditions must a and b satisfy for the matrix 

aA-b b—a 
a — b bA-a 

to be orthogonal? 


Answer: 


A 2 - 1 
* +b ~2 

14. Prove that a 2 x 2 orthogonal matrix A has only one of two possible forms: 

cos # sin # 
sin# —cos# 

where 0 < # < 2 tt. [Hint: Start with a general 2x2 matrix A = , and use the fact that the column vectors form an orthonormal set in £ 2 .] 

(a) Use the result in Exercise 14 to prove that multiplication by a 2 x 2 orthogonal matrix is either a reflection or a reflection followed by a 
rotation about the x-axis. 

(b) Prove that multiplication by ^4is a rotation if det(^4) = 1 and that a reflection followed by a rotation if det(^4) = — 1. 


A = 


cos# —sin# 
sin# cos# 


or A = 


16. Use the result in Exercise 15 to determine whether multiplication by A is a reflection or a reflection followed by a rotation about the x-axis. 
Find the angle of rotation in either case. 


j__i_ 

{2 {2 


(b) 



2 


1 

2 


17. Find a , b , and c for which the matrix 

h J_ J_ 

fe ft 

c -r 7= 

l/3 

is orthogonal. Are the values of a , b , and c unique? Explain. 


Answer: 


_a ?. 2 _ 1 

The only possibilities are “ — 0 ~ i—> c — r~ 


fe fs 


r a = 0, b 



18. The result in Exercise 15 has an analog for 3 x 3 orthogonal matrices: It can be proved that multiplication by a 3 x 3 orthogonal matrix A is a 
rotation about some axis if det(.d) = 1 and is a rotation about some axis followed by a reflection about some coordinate plane if det(^4) = — 1 
. Determine whether multiplication by A is a rotation or a rotation followed by a reflection. 


(a) 


A = 


1 1 

7 7 

6 3 

7 7 
2 6 
7 7 


6 

7 

2 

7 

3 

7 



(b) 


2 3 

7 7 


6 2 

7 7 

Use the fact stated in Exercise 18 and part (6) of Theorem 7.1.2 to show that a composition of rotations can always be accomplished by a single 
rotation about some appropriate axis. 

Prove the equivalence of statements (a) and (c) in Theorem 7.1.1. 

A linear operator on R 2 is called rigid if it does not change the lengths of vectors, and it is called angle preserving if it does not change the 
angle between nonzero vectors. 

(a) Name two different types of linear operators that are rigid. 

(b) Name two different types of linear operators that are angle preserving. 

(c) Are there any linear operators on R 1 that are rigid and not angle preserving? Angle preserving and not rigid? Justify your answer. 

Answer: 

(a) Rotations about the origin, reflections about any line through the origin, and any combination of these 

(b) Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 

(c) No; dilations and contractions 

True-False Exercises 


19. 

20 . 
21 . 



In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 


(a) 1 

The matrix 0 
0 


0 

1 

0 


is orthogonal. 


Answer: 

False 

^ The matrix 


-2 

1 


is orthogonal. 


Answer: 

False 

(c) An^ x « matrix A is orthogonal if A 7 A = L 
Answer: 

False 

(d) A square matrix whose columns form an orthogonal set is orthogonal. 
Answer: 

False 

(e) Every orthogonal matrix is invertible. 

Answer: 

True 

(f) If A is an orthogonal matrix, then is orthogonal and (det A) 2 = 1. 


Answer: 

True 


(g) Every eigenvalue of an orthogonal matrix has absolute value 1. 








Answer: 


True 

(h) If A is a square matrix and \\Au\\ = 1 for all unit vectors u, then A is orthogonal. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



7.2 Orthogonal Diagonalization 

In this section we will be concerned with the problem of diagonalizing a symmetric matrix A. As we will see, this problem is 
closely related to that of finding an orthonormal basis for R n that consists of eigenvectors of A. Problems of this type are 
important because many of the matrices that arise in applications are symmetric. 


The Orthogonal Diagonalization Problem 

In Definition 1 of Section 5.2 we defined two square matrices, A and B , to be similar if there is an invertible matrix P such 
that p ~^AP = B- I n this section we will be concerned with the special case in which it is possible to find an orthogonal 
matrix P for which this relationship holds. 

We begin with the following definition. 


DEFINITION 1 

If A and B are square matrices, then we say that A and B are orthogonally similar if there is an orthogonal matrix P 
such that p TAP = B- 


J 


If A is orthogonally similar to some diagonal matrix, say 

P r AP = D 

then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A. 

Our first goal in this section is to determine what conditions a matrix must satisfy to be orthogonally diagonalizable. As a 
first step, observe that there is no hope of orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, 
suppose that 

P t AP = D (1) 

where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side of 1 by P, the right side by p *, and then 
using the fact that pp T = p T p = /, we can rewrite this equation as 

A = PDP T (2) 

Now transposing both sides of this equation and using the fact that a diagonal matrix is the same as its transpose we obtain 

a t = (pdp t) j T = (p t) j T d t p t =pdp t =a 

so A must be symmetric. 


Conditions for Orthogonal Diagonalizability 


The following theorem shows that every symmetric matrix is, in fact, orthogonally diagonalizable. In this theorem, and for 
the remainder of this section, orthogonal will mean orthogonal with respect to the Euclidean inner product on R n . 


THEOREM 7.2.1 


If A is an n x n matrix, then the following are equivalent. 

(a) A is orthogonally diagonalizable. 

(b) A has an orthonormal set of n eigenvectors. 

(c) A is symmetric. 


Proof 


Since A is orthogonally diagonalizable, there is an orthogonal matrix P such that p 1 J\p is diagonal. As shown in 
the proof of Theorem 5.2.1, the n column vectors of P are eigenvectors of^4. Since P is orthogonal, these column vectors are 
orthonormal, so A has n orthonormal eigenvectors. 

Assume that A has an orthonormal set of n eigenvectors {pi, P2> , P n) • As shown in the proof of Theorem 5.2.1, 

the matrix P with these eigenvectors as columns diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal 
and thus orthogonally diagonalizes A. 


In the proof that (< a ) => {b) we showed that an orthogonally diagonalizable ^ x n matrix A is orthogonally 
diagonalized by an n x n matrix P whose columns form an orthonormal set of eigenvectors of A. Let D be the diagonal 
matrix 

d=p t ap 

from which it follows that 

A = PDP r 


Thus, 


a t = (pdp t) j T =pd t p t =pdp t =a 


which shows that A is symmetric. 


The proof of this part is beyond the scope of this text and will be omitted. 


Properties of Symmetric Matrices 

Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, but before we can do so, we need 
the following critical theorem about eigenvalues and eigenvectors of symmetric matrices. 


THEOREM 7.2.2 

If A is a symmetric matrix, then: 

(a) The eigenvalues ofv4 are all real numbers. 

(b) Eigenvectors from different eigenspaces are orthogonal. 


Part (a), which requires results about complex vector spaces, will be discussed in Section 7.5. 


Proof (b) Let v\ and *2 be eigenvectors corresponding to distinct eigenvalues X\ and A 2 of the matrix A. We want to show 
that v\ • V2 = 0. Our proof of this involves the trick of starting with the expression Av\ • V2 - It follows from Formula 26 of 
Section 3.2 and the symmetry of A that 


Av\ • v 2 = vi • A T V2 = vi • Av2 

But vi is an eigenvector of^4 corresponding to and V2 is an eigenvector ofv4 corresponding to A 2 , so 3 yields the 
relationship 

Aivi • V2 = vi • A2V2 


which can be rewritten as 


(3) 


(Ai-A 2 )(vi • v 2 ) = (4) 

But Ai — A 2 * 0, since X\ and A 2 were assumed distinct. Thus, it follows from 4 that v\ • V 2 = 0. 


Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a symmetric matrix. 


n 


Orthogonally Diagonalizing an n * n Symmetric Matrix 

Step 1 Find a basis for each eigenspace of A. 

Step 2 Apply the Gram-Schmidt process to each of these bases to obtain an orthonormal basis for each eigenspace. 

Step 3 Form the matrix P whose columns are the vectors constructed in Step 2. This matrix will orthogonally 
diagonalize A, and the eigenvalues on the diagonal of p _ pTj^p will be in the same order as their corresponding 
eigenvectors in P. 


L J 

The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvectors from different 
eigenspaces are orthogonal, and applying the Gram-Schmidt process ensures that the eigenvectors within the same 
eigenspace are orthonormal. It follows that the entire set of eigenvectors obtained by this procedure will be orthonormal. 


EXAMPLE 1 Orthogonally Diagonalizing a Symmetric Matrix 


Find an orthogonal matrix P that diagonalizes 


A = 


4 

2 

2 


2 2 
4 2 
2 4 


We leave it for you to verify that the characteristic equation of A is 


A —4 


det(A/ — A) = det 


-2 

-2 


-2 

A —4 

-2 


-2 

-2 

A — 4 


(A — 2) 2 (A — 8) = 0 


Thus, the distinct eigenvalues of A are A = 2 an d A = 8- By the method used in Example 7 of Section 5.1, it 
can be shown that 






U1 = 


(5) 


'-f 


-r 

1 

0 

and 112 = 

0 

1 


form a basis for the eigenspace corresponding to ,\ = 2- Applying the Gram-Schmidt process to {ui, 112 ) 
yields the following orthonormal eigenvectors (verify): 





1 


1 

~f2 


'fe 

1 

'fe 

2 

vi = 

1 

f2 

and V 2 = 


0 


fe_ 


The eigenspace corresponding to ,\ = 8 has 


113 = 


1 

1 

1 


as a basis. Applying the Gram-Schmidt process to { 113 } (i.e., normalizing 113 ) yields 


v 3 = 


f 

f 


Finally, using v\ 9 V 2 , and V 3 as column vectors, we obtain 


f2 

f2 


f f 
f f 

_2_ J_ 

fe 1/3 


which orthogonally diagonalizes A. As a check, we leave it for you to confirm that 


[__L J_ 0 


1 _1_ 

f f 


f2 

f f 

_l__L JL 

'4 2 2' 

J_ _1_ J_ 

f f f 

2 4 2 

2 2 4 

f 

f f 

l 1 1 

f f f 


0 -2- -L 

fe f 


2 0 0 
0 2 0 
0 0 8 


Spectral Decomposition 

If A is a symmetric matrix that is orthogonally diagonalized by 


( 6 ) 
























P=[ui u 2 - u„] 

andifAj, A 2 , A„ are the eigenvalues of A corresponding to the unit eigenvectors \i\, 112 , u„, then we know that 

q _ pT where D is a diagonal matrix with the eigenvalues in the diagonal positions. It follows from this that the matrix 
A can be expressed as 






Ai 

0 . 

.. 0 

T 

U 1 

A = PDP T = 

ui u 2 . 

u„ 


0 

a 2 . 

.. 0 

T 

u 2 





0 

0 . 


1_ 




T~ 

U 1 

Aiui A 2 u 2 

- A„u„ 

T 

u 2 



T 

u„ 


Multiplying out, we obtain the formula 

^4 = Aiuiuf + A211211J +--- + A„u„u£ ( 7 ) 


which is called a spectral decomposition of A. 

Note that in each term of the spectral decomposition of A has the form \ uu ^, where u is a unit eigenvector of A in column 
form, and A is an eigenvalue of^4 corresponding to u. Since u has size ^ x 1, it follows that the product UU T has size « x «• It 
can be proved (though we will not do it) that UU T is the standard matrix for the orthogonal projection of R n on the subspace 
spanned by the vector u. Accepting this to be so, the spectral decomposition of A tells that the image of a vector x under 
multiplication by a symmetric matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional 
subspaces) determined by the eigenvectors of A, then scaling those projections by the eigenvalues, and then adding the scaled 
projections. Here is an example. 

EXAMPLE 2 A Geometric Interpretation of a Spectral Decomposition 


The matrix 



has eigenvalues X\= — 3 and A 2 = 2 with corresponding eigenvectors 


xi = 



and X 2 = 


2 

1 


(verify). Normalizing these basis vectors yields 


u, X1 

1 


' 2 

& 

01 - mi - 

2 

" d " 2_ tall ' 

1 




so a spectral decomposition of A is 






















( 8 ) 


r 1 2 

T T 

1 

ft 

1 

2 " 


' 2 
ft 

'2 1 

[2 -2 

= Aiujuj 4- A 2112 U 2 = ( - 3) 

2 

~ft_ 

ft 

~ft_ 

+ (2) 

1 

ft_ 

ft ft_ 


= (-3) 


1 _2 
5 5 

2 4 
'5 5 



'4 2' 

+ ( 2 ) 

5 5 

2 1 


5 5 


where, as noted above, the 2 x 2 matrices on the right side of 8 are the standard matrices for the orthogonal 
projections onto the eigenspaces corresponding to Aj = — 3 and A 2 = 2, respectively. 

Now let us see what this spectral decomposition tells us about the image of the vector x = (1, 1) under 
multiplication by A. Writing x in column form, it follows that 


Ax.= 


"1 2 ' 

T 


■ 3 ' 

_2 — 2 _ 

_ 1 _ 


_ 0 _ 


(9) 


and from 8 that 


Ax = 


'1 2 

T 

.2 -2. 

l 


= (-3) 




'4 2' 

r r 


5 5 

[i_ 

+ (2) 

2 1 



5 5 


(-3) 

1 ' 
5 

2 

+ ( 2 ) 

' 6 ' 

5 

3 


5 


5 




L 

J 


L 


3' 


' 12 ' 




5 


5 


3 


6 

6 


_ 0 _ 


5 


5 




( 10 ) 


Formulas 9 and 10 provide two different ways of viewing the image of the vector (1,1) under multiplication by 
A: Formula 9 tells us directly that the image of this vector is (3, 0), whereas Formula 10 tells us that this image 
can also be obtained by projecting ( 1 , 1 ) onto the eigenspaces corresponding to Aj = — 3 and A 2 = 2 to obtain 

the vectors j — -j, ^ j and j j, then scaling by the eigenvalues to obtain — -j j and ^ j, and then 
adding these vectors (see Figure 7.2.1). 



Figure 7.2.1 














































The Nondiagonalizable Case 


If A is an n x n matrix that is not orthogonally diagonalizable, it may still be possible to achieve considerable simplification 
in the form of p ^AP by choosing the orthogonal matrix P appropriately. We will consider two theorems (without proof) that 
illustrate this. The first, due to the German mathematician Isaai Schur, states that every square matrix A is orthogonally 
similar to an upper triangular matrix that has the eigenvalues of A on the main diagonal. 


Schur's Theorem 


If A is an ^ x n matrix with real entries and real eigenvalues, then there is an orthogonal matrix P such that p ^AP is 
an upper triangular matrix of the form 


P r AP = 


Ai 

X 

X 

X 

0 

h 

X 

X 

0 

0 

a 3 • 

• X 

0 

0 

0 • 

Am 


( 11 ) 


in which Aj, A 2 ,A„ are the eigenvalues of the matrix A repeated according to multiplicity. 



Issai Schur (1875-1941) 

The life of the German mathematician Issai Schur is a sad reminder of the effect that Nazi policies 
had on Jewish intellectuals during the 1930s. Schur was a brilliant mathematician and a popular lecturer who 
attracted many students and researchers to the University of Berlin, where he worked and taught. His lectures 
sometimes attracted so many students that opera glasses were needed to see him from the back row. Schur's life 
became increasingly difficult under Nazi rule, and in April of 1933 he was forced to “retire” from the university 
under a law that prohibited non-Aryans from holding “civil service” positions. There was an outcry from many of his 
students and colleagues who respected and liked him, but it did not stave off his complete dismissal in 1935. Schur, 
who thought of himself as a loyal German never understood the persecution and humiliation he received at Nazi 
hands. He left Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had to sell his 
beloved mathematics books and lived in poverty until his death in 1941. 

[Image: Courtesy Electronic Publishing Services, Inc., New York City] 


It is common to denote the upper triangular matrix in 11 by S (for Schur), in which case that equation can be rewritten as 





( 12 ) 


a=psp t 


which is called a Schur decomposition of A. 

The next theorem, due to the German mathematician and engineer Karl Hessenberg (1904-1959), states that every square 
matrix with real entries is orthogonally similar to a matrix in which each entry below the first subdiagonal is zero (Figure 
7.2.2). Such a matrix is said to be in upper Hessenberg form. 

X X X X X 

X X X X X 

X X X X X 

X X X X X 

X X X X X 

First subdiagonal 


Figure 7.2.2 


Hessenberg's Theorem 


If A is an « x n matrix, then there is an orthogonal matrix P such that p ^AP is a matrix of the form 


x 

x 


p t ap= 


0 


X 

X 

X 


X X 
X X 

X X 


X 

X 

X 


0 0 ■ ■ ■ X X X 

0 0 ■ • ■ 0 xx 


Note that unlike those in 11, the diagonal entries in 13 
are usually not the eigenvalues of A. 


(13) 


It is common to denote the upper Hessenberg matrix in 13 by //(for Hessenberg), in which case that equation can be 
rewritten as 


A = PHP r (14) 

which is called an upper Hessenberg decomposition of A. 

In many numerical algorithms the initial matrix is first converted to upper Hessenberg form to reduce the amount 
of computation in subsequent parts of the algorithm. Many computer packages have built-in commands for finding Schur and 
Hessenberg decompositions. 


Concept Review 

Orthogonally similar matrices 







Orthogonally diagonalizable matrix 

Spectral decomposition (or eigenvalue decomposition) 

Schur decomposition 

Subdiagonal 

Upper Hessenburg form 

Upper Hessenburg decomposition 

Skills 

Be able to recognize an orthogonally diagonalizable matrix. 

Know that eigenvalues of symmetric matrices are real numbers. 

Know that for a symmetric matrix eigenvectors from different eigenspaces are orthogonal. 
Be able to orthogonally diagonalize a symmetric matrix. 

Be able to find the spectral decomposition of a symmetric matrix. 

Know the statement of Schur’s Theorem. 

Know the statement of Hessenburg's Theorem. 


Exercise Set 7.2 


1. Find the characteristic equation of the given symmetric matrix, and then by inspection determine the dimensions of the 


eigenspaces. 

(a) 

"1 

2' 



_2 

4_ 


(b) 


1 

-4 



■4 

1 



2 

-2 

(c) 

"1 

1 

f 


1 

1 

1 


1 

1 

1 

(d) 

"4 

2 

2' 


2 

4 

2 


2 

2 

4 

(e) 

’4 

4 

0 


4 

4 

0 


0 

0 

0 


0 

0 

0 

(f) 


2 

-1 



•1 

2 



0 

0 



0 

0 



0 

0 

2 

-1 



Answer: 


(a) A 2 — 5A = 0: A = 0: one-dimensional; A = 5: one-dimensional 

(b) A 3 — 27A — 54 = 0: A = 6: one-dimensional; A = — 3; two-dimensional 








(c) A 3 — 3A 2 = 0: A = 3: one-dimensional; A = 0: two-dimensional 

(d) A 3 — 12A 3 + 36A — 32 = 0; A = 2: two-dimensional; A = 8: one-dimensional 

(e) A 4 — 8A 3 = 0: A = 0: three-dimensional; A = 8: one-dimensional 

(f) A 4 — 8A 3 4- 22A 2 — 24A + 9 = 0; A = 1: two-dimensional; A = 3: two-dimensional 

In Exercises 2-9, find a matrix P that orthogonally diagonalizes A, and determine P~^AP- 



5. -2 0 -36" 

A= 0-3 0 

-36 0 -23 


Answer: 



6. ri i o' 

A= 110 
0 0 0 

7. 2 -i -1" 

A= -1 2 -1 

-1 -1 2 





No 

16. Find the spectral decomposition of each matrix. 

(a) [3 11 

J 3j 

(b) [ 6 -2] 

-2 3j 

(c) f-3 12' 

1 -3 2 

2 2 0 



(d) 


—2 

0 


-36 


0 -36 
-3 0 

0 -23 


17. Show that if A is a symmetric orthogonal matrix, then 1 and _1 are the only possible eigenvalues. 

(a) Find a 3 x 3 symmetric matrix whose eigenvalues are \\ = — 1, A 2 = 3, A 3 = 7 and for which the corresponding 
eigenvectors are vi = (0, 1 , — 1), V 2 = (1, 0, 0), V 3 = (0, 1 , 1). 

(b) Is there a 3 x 3 symmetric matrix with eigenvalues X\= — 1, A 2 = 3, A 3 = 7 and corresponding eigenvectors 
VJ = (0, 1, — 1), V 2 = (1, 0, 0), V 3 = (1, 1, 1)? Explain your reasoning. 

19. Let A be a diagonalizable matrix with the property that eigenvectors from distinct eigenvalues are orthogonal. Must A be 
symmetric? Explain you reasoning. 


Answer: 


Yes 


20 . Prove: If (\i\, 112,u„) is an orthonormal basis for R n , and if A can be expressed as 

T T T 

il = £11111! +C2 u 2 u 2 + --- + C w u„u„ 
thenv4 is symmetric and has eigenvalues c\, C 2 ,...» c n . 


21. In this exercise we will establish that a matrix A is orthogonally diagonalizable if and only if it is symmetric. We have 
shown that an orthogonally diagonalizable matrix is symmetric. The harder part is to prove that a symmetric matrix A is 
orthogonally diagonalizable. We will proceed in two steps: first we will show that A is diagonalizable, and then we will 
build on that result to show that A is orthogonally diagonalizable. 


(a) Assume that A is a symmetric nxn matrix. One way to prove that A is diagonalizable is to show that for each 
eigenvalue Aq the geometric multiplicity is equal to the algebraic multiplicity. For this purpose, assume that the 
geometric multiplicity of Aq is k, let 5 q = {uj, 112 ,u^} be an orthonormal basis for the eigenspace corresponding 
to Aq, extend this to an orthonormal basis B = {uj, 112 ,..} for R ”, and let P be the matrix having the vectors of 
B as columns. As shown in Exercise 34(Z?) of Section 5.2, the product AP can be written as 


AP = P 


0 


x 

Y 


Use the fact that B is an orthonormal basis to prove that X = 0 t a zero matrix of size n x (n — k) ]. 


(b) It follows from part (a) and Exercise 34(c) of Section 5.2 that A has the same characteristic polynomial as 


C = 


-W/c 

0 


0 

Y 


Use this fact and Exercise 34(d) of Section 5.2 to prove that the algebraic multiplicity of Aq is the same as the 
geometric multiplicity of Aq. This establishes that A is diagonalizable. 

(c) Use Theorem 1.2.2(b) and the fact that A is diagonalizable to prove that^4 is orthogonally diagonalizable. 


True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer, 
(a) If A is a square matrix, then AA ^ an d A ^ A are orthogonally diagonalizable. 

Answer: 


True 








(b) If vi and V2 are eigenvectors from distinct eigenspaces of a symmetric matrix, then || V j | v .-,|| 2 = \\v\ || 2 I l|v 2 || 2 - 
Answer: 

True 

(c) Every orthogonal matrix is orthogonally diagonalizable. 

Answer: 

False 

(d) If A is both invertible and orthogonally diagonalizable, then is orthogonally diagonalizable. 

Answer: 

True 

(e) Every eigenvalue of an orthogonal matrix has absolute value 1. 

Answer: 

True 

(f) If A is an n x n orthogonally diagonalizable matrix, then there exists an orthonormal basis for R n consisting of 
eigenvectors of A. 

Answer: 

False 

(g) Ifv4 is orthogonally diagonalizable, then^4 has real eigenvalues. 

Answer: 

True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



7.3 Quadratic Forms 

In this section we will use matrix methods to study real-valued functions of several variables in which each term is either the 
square of a variable or the product of two variables. Such functions arise in a variety of applications, including geometry, 
vibrations of mechanical systems, statistics, and electrical engineering. 


Definition of a Quadratic Form 


Expressions of the form 


a\x\ + a 2 * 2 + - • * A-a n x n 

occurred in our study of linear equations and linear systems. If a\, a 2 ,a n are treated as fixed constants, then this expression 
is a real-valued function of the n variables x\, x 2 , x n an d is called a linear form on R n . All variables in a linear form occur 
to the first power and there are no products of variables. Here we will be concerned with quadratic forms on R n , which are 
functions of the form 


2 , 2 . . 2 
^1*1 +32*2 + ... + <3„*„ 


terms a^XjXj in which xj * Xj 


The terms of the form &k x i x j are called cross product terms. It is common to combine the cross product terms involving x i x j 
with those involving x j x i to avoid duplication. Thus, a general quadratic form on r} would typically be expressed as 


2 2 

a 1*1 +<»2*2 ^ a 3 x \ x 2 


( 1 ) 


and a general quadratic form on R 3 as 


2 2 2 

a 1*! + <22*2 + <23*3 + 2(34* 1*2 + 2 a $x 1x3 + 2a 5*2*3 (2) 

If, as usual, we do not distinguish between the number a and the ] x 1 matrix [a\, and if we let x be the column vector of 
variables, then 1 and 2 can be expressed in matrix form as 


= x T Ax 


(verify). Note that the matrix^ in these formulas is symmetric, that its diagonal entries are the coefficients of the squared terms, 
and its off-diagonal entries are half the coefficients of the cross product terms. In general, if A is a symmetric ^ x n matrix and x 
is an n x 1 column vector of variables, then we call the function 


[*1 

*2] 

1 1 
& & 
LO h- 

33' 

a 2 _ 

1 - 1 

* * 
to H- 

b 

x 7 ^ 




'31 

a 4 

35 

-*r 

[*1 

*2 

*3] 

a/\ 

32 

<36 

*2 




35 

36 

33 

*3 


Qa{*} = x r ^JC 

the quadratic form associated with A. When convenient, 3 can be expressed in dot product notation as 


( 3 ) 


x 1 Ax = x- Ax. = Ax - x 


( 4 ) 


In the case where A is a diagonal matrix, the quadratic form x^Ax has no cross product terms; for example, if A has diagonal 
entries Al,A 2 .a m , then 




Ai 

0 • • 

• 0 ' 

"*r 



T 

x +x= [x 1 *2 ‘ ' 

x n\ 

0 

a 2 • • 

• 0 

*2 

= Ai*^ + A2*2 + " ' 

+ A m * m 



0 

0 • • 

• A„ 

*H 













EXAMPLE 1 Expressing Quadratic Forms in Matrix Notation 


In each part, express the quadratic form in the matrix notation x^Ax> where A is symmetric. 

(a) 2x 2 + 6xy - 5y 2 

(b) xj + 7x| — 3^3 + 4 x 1 X 2 — 2 x 1 x 3 + 8 x 2 x 2 


The diagonal entries of A are the coefficients of the squared terms, and the off-diagonal entries are half 
the coefficients of the cross product terms, so 


2 x 2 + 6 xy - 5 y 2 



x 2 + 7 x 2 — 3x 2 + 4 xiX 2 — 2 x 1 x 3 + 8 x 2 x 3 = [^1 *3] 


1 

2 

-l' 

"*l" 

2 

7 

4 

x 2 

1 

4 

-3 

x 3 


Change of Variable in a Quadratic Form 

There are three important kinds of problems that occur in applications of quadratic forms: 


If x^lx is a quadratic form on p} or what kind of curve or surface is represented by the equation 

x T Ax = tf 

If X J Ax is a quadratic form on R n , what conditions must A satisfy for X J Ax to have positive values for 

x*0? 

If x ^Ax is a quadratic form on R n , what are its maximum and minimum values if x is constrained to satisfy 

11*11 = 1 ? 


We will consider the first two problems in this section and the third problem in the next section. 

Many of the techniques for solving these problems are based on simplifying the quadratic form x^Ax by making a substitution 

* = Py (5) 

that expresses the variables x\, X2, i n terms of new variables y\,y^, y n - ^ ^ i nver tible, then we call 5 a change of 

variable , and if P is orthogonal, then we call 5 an orthogonal change of variable. 

If we make the change of variable x = Py in the quadratic form X J Ax* then we obtain 

x T Ax = (Py) T A{Py ) j = y T P T APy = y T {p T AP'y (6) 

Since the matrix B = P 7 AP is symmetric (verify), the effect of the change of variable is to produce a new quadratic form y J By 
in the variables y\,y 2 , - y I n particular, if we choose P to orthogonally diagonalize A , then the new quadratic form will be 
y Dy, where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is, 











At 

0 • • 

• 0 ' 

>r 

x T Ax = y T Dy =[y\y 2 ' 

■ • y»\ 

0 

A 2 • • 

• 0 

72 



0 

0 • • 


yn 


= AiJ>i +A2>>2 + ‘ ‘ ‘ + A fl y^ 


Thus, we have the following result, called the principal axes theorem. 


The Principal Axes Theorem 

If A is a symmetric nxn matrix, then there is an orthogonal change of variable that transforms the quadratic form x^ylx 

T 

into a quadratic form y Dy with no cross product terms. Specifically, if P orthogonally diagonalizes A, then making the 
change of variable x = Py in the quadratic form X J Ax. yields the quadratic form 

\ T Ax = y T Dy = X\yj +^2 + ' ‘ ' 

in which X\ f A2,A„ are the eigenvalues ofv 4 corresponding to the eigenvectors that form the successive columns of 
P. 


EXAMPLE 2 An Illustration of the Principal Axes Theorem 


Find an orthogonal change of variable that eliminates the cross product terms in the quadratic form 
Q = — *3 — 4 * 1*2 4 " 4*2*3? an ^ ex P ress Q m terms of the new variables. 


The quadratic form can be expressed in matrix notation as 


Q = x T Ax = 


*1 


*2 *3 


1 -2 0 

-2 0 2 

0 2-1 


*1 

*2 

*3 


The characteristic equation of the matrix A is 


A— 1 
2 
0 


2 0 
A -2 
-2 A+l 


= A 3 — 9A = A(A + 3) (A — 3) = 0 


so the eigenvalues are A = 0 ? “ 3 , 3 . We leave it for you to show that orthonormal bases for the three eigenspaces 
are 


'2 


T 


2' 

3 

1 

, A = — 3: 

3 

2 

, A= 3: 

3 

2 

3 


3 


3 

2 


2 


1 

3 


3 


3 


Thus, a substitution x = Py that eliminates the cross product terms is 


’*l" 


'2 1 2 ' 

333 

>r 

*2 

_ 

I _2 2 

72 

*3 


3 3 3 

2 2 1 

73 



3 3 3 



This produces the new quadratic form 



























O 

o 

o 

>f 

72 73] 

0-3 0 

0 0 3 

72 

73 


in which there are no cross product terms. 


-3y 2 2 + 3y 3 2 


If A is a symmetric n x n matrix, then the quadratic form x T Ax is a real-valued function whose range is the set of all 
possible values for x^Ax as x varies over R n . It can be shown that an orthogonal change of variable x = Py does not alter the 
range of a quadratic form; that is, the set of all values for xJAx as x varies over R n is the same as the set of all values for 

y T (P T APy as y varies over R n . 


Quadratic Forms in Geometry 

Recall that a conic section or conic is a curve that results by cutting a double-napped cone with a plane (Figure 7.3.1). The most 
important conic sections are ellipses, hyperbolas, and parabolas, which result when the cutting plane does not pass through the 
vertex. Circles are special cases of ellipses that result when the cutting plane is perpendicular to the axis of symmetry of the 
cone. If the cutting plane passes through the vertex, then the resulting intersection is called a degenerate conic. The possibilities 
are a point, a pair of intersecting lines, or a single line. 


I 

I 

I 


Circle 


Ellipse 


Parabola 


Hyperbola 


Figure 7.3.1 



A central conic 
rotated out of 
standard position 


Figure 7.3.2 


Quadratic forms in R^ arise naturally in the study of conic sections. For example, it is shown in analytic geometry that an 





















equation of the form 


ax 2 4- 2bxy + cy 2 + dx + ey + f = 0 (7) 

in which a, b , and c are not all zero, represents a conic section. If = e = 0 in 7, then there are no linear terms, so the equation 
becomes 


ax 2 4- 2i?xy + cy 2 4* / = 0 (8) 

and is said to represent a central conic. These include circles, ellipses, and hyperbolas, but not parabolas. Furthermore, if fa = 0 
in 8, then there is no cross product term (i.e., term involving xy), and the equation 

ax 2 +cy 2 +f = 0 (9) 

is said to represent a central conic in standard position. The most important conics of this type are shown in Table 1. 

Table 1 


0 

i y i 

P 

X 

y 

0 

*1 

JL_ 

i \ 

* 

“7f 

-a 

-0 

a * -a 

-0 

a * « | 

/l_ 

p 

i — 

i / 
i / 

i_ 

Q 

1 

\(\' 

1 

!* 


Z + 2_ = | 


i 

£. + 2l = . 


2 2 


A A., 

a- 0- 


2 q2 

a P 


a 2 P 2 


^1 

l 

li 

(a>l 3 >0) 


(j3 > a > 0) 


(a > 0.0 > 0) 


(a > 0. 0 > 0) 


If we take the constant/in Equations 8 and 9 to the right side and let k — — / , then we can rewrite these equations in matrix 
form as 


[* y] 


a 

b 


b 

c 



and 


[* y] 


a 0 

~x~ 

_0 c_ 

y 


= k 


( 10 ) 


The first of these corresponds to Equation 8 in which there is a cross product term 2 bxy, and the second corresponds to Equation 
9 in which there is no cross product term. Geometrically, the existence of a cross product term signals that the graph of the 
quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional analogs of the equations in 10 are 



a d e 

~x " 


& 

o 

o 

"x “ 

[X y z ] 

d b / 

e J c 

z 

= A: and [ x ^ z ] 

0 £ 0 

0 0c 

z 


= k 


If a , b, and c are not all zero, then the graphs of these equations in fi? are called central quadrics in standard position. 


( 11 ) 


Identifying Conic Sections 

We are now ready to consider the first of the three problems posed earlier, identifying the curve or surface represented by an 
equation x^Ax. = k i n two or three variables. We will focus on the two-variable case. We noted above that an equation of the 
form 




























( 12 ) 


ax 2 4 - 2 bxy + cy 2 4 - / = 0 


represents a central conic. If £? = 0, then the conic is in standard position, and if £ ^ Q, it is rotated. It is an easy matter to 
identify central conics in standard position by matching the equation with one of the standard forms. For example, the equation 


can be rewritten as 


9x 2 + 16y 2 — 144 = 0 


x 2 y 2 
— + — = 1 
16 ^ 9 


which, by comparison with Table 1, is the ellipse shown in Figure 7.3.3. 

a y 


— + — = i 
16 9 


Figure 7.3.3 

If a central conic is rotated out of standard position, then it can be identified by first rotating the coordinate axes to put it in 
standard position and then matching the resulting equation with one of the standard forms in Table 1. To find a rotation that 
eliminates the cross product term in the equation 


ax 2 + 2 bxy + cy 2 = k 


(13) 


it will be convenient to express the equation in the matrix form 

x y 


x T Jix = 


® =' 


(14) 


and look for a change of variable 

x = Px' 

that diagonalizes A and for which det(.P) = 1. Since we saw in Example 4 of Section 7.1 that the transition matrix 


P = 


cos # —sin # 
sin# cos# 


(15) 


has the effect of rotating the xy-axes of a rectangular coordinate system through an angle 0, our problem reduces to finding 0 that 
diagonalizes A , thereby eliminating the cross product term in 13. If we make this change of variable, then in the x r y f -coordinate 
system, Equation 14 will become 


x ,T Dx' = 


/ / 

0 

< 

1- 

1 - 

* 

1 _ 

- 1 

CN 

O 

_ 1 

- 1 

_ 1 


= k 


where Ai and A 2 are the eigenvalues of A. The conic can now be identified by writing 16 in the form 

\ix , 2 + \ 2 y' 2 = k 


(16) 


( 17 ) 


and performing the necessary algebra to match it with one of the standard forms in Table 1. For example, if Aj, A 2 , and k are 
positive, then 17 represents an ellipse with an axis of length 2\jk I X\ i n the x^-direction and 2 \jk / Ai n the y^-direction. The 


















first column vector of P, which is a unit eigenvector corresponding to X\, is along the positive x'-axis; and the second column 
vector of P, which is a unit eigenvector corresponding to A 2 , is a unit vector along the y' -axis. These are called the principal 
axes of the ellipse, which explains why Theorem 7.3.1 is called “the principal axes theorem.” (See Figure 7.3.4.) 



EXAMPLE 3 Identifying a Conic by Eliminating the Cross Product Term 

99 

Identify the conic whose equation is 5x — Axy + 8y — 36 = 0 by rotating the xy-axes to put the conic in 
standard position. 

Find the angle 0 through which you rotated the xy-axes in part (a). 


Solution 

The given equation can be written in the matrix form 

x T Ax. = 36 


where 



The characteristic polynomial of A is 


A — 5 2 

2 A — 8 


= (A —4)(A —9) 


so the eigenvalues are A = 4 and A = 9- We leave it for you to show that orthonormal bases for the eigenspaces 
are 


A = 4: 


_ 2 _ 

& 

f5 


A = 9: 


f5 

JL 

f5 


Thus, A is orthogonally diagonalized by 




( 18 ) 


Had it turned out that det(P) = — 1, then we 
would have interchanged the columns to reverse the 
sign. 














Moreover, it happens by chance that det(^P) = 1, so we are assured that the substitution x = p x r performs a 
rotation of axes. It follows from 16 that the equation of the conic in the x f y '-coordinate system is 


1- 

1_ 

"4 O' 

1 - 

i _ 

1 - 

* 

1 _ 

.0 9 . 

-1 

_i 


= 36 


which we can write as 

4x' 2 + 9/ 2 = 36 or ^- + ^ = 1 

We can now see from Table 1 that the conic is an ellipse whose axis has length 2a = 6 i n the x f -direction and 
length 2/3 = 4 in the y r ■ -direction. 


It follows from 15 that 

cos 9 —sin 0 
sin# cos 6 


which implies that 

cos 0 = -7=, 

Thus, 9 = pa 26 . 6 (Figure 7.3.5). 


sin 0 = 


f5’ 


tan 9 = 


sinfl 

cos 0 


_2_ _J_ 

f5 ~f5 
J_ _2_ 
f5 f5 



Figure 7.3.5 


In the exercises we will ask you to show that if £, ^ Q, then the cross product term in the equation 

ax 2 + 2 bxy + cy 2 = k 

can be eliminated by a rotation through an angle 0 that satisfies 

cot 29=4-=^- 
2b 

We leave it for you to confirm that this is consistent with part (b) of the last example. 


Positive Definite Quadratic Forms 

We will now consider the second of the two problems posed earlier, determining conditions under which > 0 f° r all 
nonzero values of x. We will explain why this is important shortly, but first we introduce some terminology. 

















The terminology in Definition 1 also applies to the 
matrix A; that is, A is positive definite, negative definite, 
or indefinite in accordance with whether the associated 
quadratic form has that property. 


n 


DEFINITION 1 

A quadratic form x^Ax is said to be 

positive definite if x^Ax > 0 for x ^ 0 

negative definite if X J Ax < 0 for x ^ 0 

indefinite if X J ^4x has both positive and negative values 

J 


The following theorem, whose proof is deferred to the end of the section, provides a way of using eigenvalues to determine 
whether a matrix A and its associated quadratic form x T Ax are positive definite, negative definite, or indefinite. 


THEOREM 7.3.2 

If A is a symmetric matrix, then: 

(a) x^Ax is positive definite if and only if all eigenvalues of A are positive. 

(b) x^Ax is negative definite if and only if all eigenvalues of A are negative. 

(c) X J Ax. is indefinite if and only if A has at least one positive eigenvalue and at least one negative eigenvalue. 


The three classifications in Definition 1 do not exhaust all of the possibilities. For example, a quadratic form for 
which X J Ax > 0 if x ^ 0 is called positive semidefinite , and one for which x j Ax < 0 if x ^ 0 is called negative semidefinite. 
Every positive definite form is positive semidefinite, but not conversely, and every negative definite form is negative 
semidefinite, but not conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that x^Ax is 
positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative semidefinite if and only if all 
eigenvalues of A are nonpositive. 


EXAMPLE 4 Positive Definite Quadratic Forms 


It is not usually possible to tell from the signs of the entries in a symmetric matrix A whether that matrix is 
positive definite, negative definite, or indefinite. For example, the entries of the matrix 


A = 


3 1 1 
1 0 2 
1 2 0 


are nonnegative, but the matrix is indefinite since its eigenvalues are \ = 1,4, —2 (verify). To see this another 
way, let us write out the quadratic form as 



'3 

1 

f 

"*l' 

*i *2 *3 

1 

0 

2 

x 2 


1 

2 

0 

x 3 


= 3x\ A- 2x i *2 4- 2x 1 x 3 + 4*2*3 


x T Ax.= 










Positive definite and negative definite matrices 
are invertible. Why? 


We can now see, for example, that 

x T Ax = 4 
and 

x r ^x= -4 


for 

*1 

= 0 , 

*2=1. 

*3 

for 

*1 

= 0 , 

*2=1. 

*3 


Classifying Conic Sections Using Eigenvalues 

If x T Bx = k is the equation of a conic, and if k t- 0? then we can divide through by k and rewrite the equation in the form 

* r A-l (20) 

where A = (1 / k)B. If we now rotate the coordinate axes to eliminate the cross product term (if any) in this equation, then the 
equation of the conic in the new coordinate system will be of the form 

Aix' 2 + A2y' 2 = l (21) 

in which Aj and A 2 are the eigenvalues of A. The particular type of conic represented by this equation will depend on the signs 
of the eigenvalues Ai and A 2 . For example, you should be able to see from 21 that: 

x T A x = 1 represents an ellipse if Aj > 0 and A 2 > 0. 

• x T Ax = 1 has no graph if Ai < 0 and A 2 < 0. 

x T Ax = 1 represents a hyperbola if Ai and A 2 have opposite signs. 

In the case of the ellipse, Equation 21 can be rewritten as 

(1 Tfcf + (i/^) 2 =1 <22) 

so the axes of the ellipse have lengths 2 / ^A^ an d 2 / ^2 (Figure 7.3.6). 



The following theorem is an immediate consequence of this discussion and Theorem 7.3.2. 





THEOREM 7.3.3 


If A is a symmetric 2x2 matrix, then: 

(a) x T Ax = 1 represents an ellipse if A is positive definite. 

(b) x T Ax = 1 has no graph ifv4 is negative definite. 

(c) x^Ax = 1 represents a hyperbola if ^4 is indefinite. 


In Example we performed a rotation to show that the equation 

5x 2 — Axy 4 - 8 y 2 — 36 = 0 


represents an ellipse with a major axis of length 6 and a minor axis of length 4. This conclusion can also be obtained by 
rewriting the equation in the form 



r y 


l y2 =' 


and showing that the associated matrix 


5 1 



18 9 


has eigenvalues Ai = 7 - and A 2 = These eigenvalues are positive, so the matrix 
represents an ellipse. Moreover, it follows from 21 that the axes of the ellipse have 
is consistent with Example 3. 


A is positive definite and the equation 
lengths 2 i = 6 and 2 / \[\2 = 4, which 


Identifying Positive Definite Matrices 

Positive definite matrices are the most important symmetric matrices in applications, so it will be useful to learn a little more 
about them. We already know that a symmetric matrix is positive definite if and only if its eigenvalues are all positive; now we 
will give a criterion that can be used to determine whether a symmetric matrix is positive definite without finding the 
eigenvalues. For this purpose we define the kth principal submatrix of an n x n matrix A to be the fc x k submatrix consisting of 
the first k rows and columns of A. For example, here are the principal submatrices of a general 4x4 matrix: 


"<311 <312 <*13 a 14~ 

<221 «22 <*23 a 24 

<331 <332 <*33 «34 

<341 a 42 a 43 a 44 


<311 <312 <313 a 14~ 
a 21 <222 a 23 a 24 

<331 a 32 a 33 a 34 

<341 a 42 a 43 a 44 


<311 <*12 <*13 a 14~ 
<321 a 22 <*23 a 24 

<331 <332 «33 a 34 

<341 a 42 ^43 a 44 


<311 <212 ^13 fl 14" 
<321 «22 a 23 a 24 

<331 a 32 <*33 a 34 

<341 a 42 a 43 a 44 

First principal submatrix 


Second principal submatrix 

Third principal submatrix 

Fourth principal submatnx= 


The following theorem, which we state without proof, provides a determinant test for ascertaining whether a symmetric matrix is 
positive definite. 


THEOREM 7.3.4 

A symmetric matrix A is positive definite if and only if the determinant of every principal submatrix is positive. 


EXAMPLE 5 Working with Principal Submatrices 
























The matrix 


is positive definite since the determinants 

| 2 | = 2 , 


A = 


2 -1 -3 
-12 4 

-3 4 9 


2 -1 

-1 2 


= 3, 


2 -1 -3 
-12 4 

-3 4 9 


= 1 


are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and > 0 for x * 0- 


OPTIONAL 

We conclude this section with an optional proof of Theorem 7.3.2. 

It follows from the principal axes theorem (Theorem 7.3.1) that there is an orthogonal 

change of variable x = Py for which 


x T Ax = y T Dy = \\yj +\ d >2 +-- + Vi> 7 « (23) 

where the A's are the eigenvalues of A. Moreover, it follows from the invertibility of P that y * 0 if and only if x * 0 ? s ° the 
values of x^Ax f° r x * 0 are the same as the values of y J Dy for y * 0 Thus, it follows from 23 that x^Ax > 0 for x * 0 if and 
only if all of the A's in that equation are positive, and that x T Ax < 0 f° r x ^ 0 if and only if all of the A's are negative. This 
proves parts ( a ) and ( b ). 

Assume that A has at least one positive eigenvalue and at least one negative eigenvalue, and to be specific, suppose 
that Ai > 0 and A 2 < 0 in 23. Then 


T t 

x Ax > 0 if jv 1 = 1 and all others s are 0 


and 


T t 

x Ax >0 if 72 = 1 and all others s are 0 

which proves that x^Ax is indefinite. Conversely, if x^Ax > 0 f° r some x, then y J Dy > 0 for some y, so at least one of the A’s 
in 23 must be positive. Similarly, if x^Ax < 0 f° r some x, then y J Dy < 0 for some y, so at least one of the A's in 23 must be 
negative, which completes the proof. 


Concept Review 

• Linear form 
Quadratic form 
Cross product term 

Quadratic form associated with a matrix 
Change of variable 
Orthogonal change of variable 
Principal Axes Theorem 
Conic section 








Degenerate conic 
Central conic 

Standard position of a central conic 
Standard form of a central conic 
Central quadric 
Principal axes of an ellipse 
Positive definite quadratic form 
Negative definite quadratic form 
Indefinite quadratic form 
Positive semidefinite quadratic form 
Negative semidefinite quadratic form 
Principal submatrix 

Skills 

Express a quadratic form in the matrix notation x^Ax* where A is a symmetric matrix. 

Find an orthogonal change of variable that eliminates the cross product terms in a quadratic form, and express the 
quadratic form in terms of the new variable. 

Identify a conic section from an equation by rotating axes to place the conic in standard position, and find the angle of 
rotation. 

Identify a conic section using eigenvalues. 

Classify matrices and quadratic forms as positive definite, negative definite, indefinite, positive semidefinite or 
negative semidefinite. 


Exercise Set 7.3 


In Exercises 1-2, express the quadratic form in the matrix notation X J Ax.? where A is a symmetric matrix. 


L (a) 3x? + 7a£ 

(b) 4*^ — 9*2 — 6*1*2 

(c) 9 *^ — *| + 4*3 + 6*1*2 — 8*1*3 + * 2*3 

Answer: 


(a) 

(b) 

(c) 


[*1 *2] 
[*1 *2] 


3 0 *1 

0 7_||_*2 

4 -3 

-3 -9 


[*l * 2 * 3 ] 


xi 
X2 

3 -4 


3 -1 | 
- 4 2 4 


*1 

*2 

*3 


2 * ( a ) 5 *^ + 5 * 1*2 

(b) -7 *i*2 













(c) x 2 + *2 — 3*3 — 5x i X 2 + 9x 1 x 3 

In Exercises 3-4, find a formula for the quadratic form that does not use matrices. 


2 -3 
-3 5 


Answer: 


2 x 2 + 5 y 2 - 6xy 


-2 l 1 


[X! x 2 x 3 ] 7 Q 6 *2 


1 6 3 


In Exercises 5-8, find an orthogonal change of variables that eliminates the cross product terms in the quadratic form Q, and 
express Q in terms of the new variables. 

5. Q = 2x 2 + 2x| — 2 xjX 2 

Answer: 


1 1 
{2 {2 


1 1 \ y 2 


{2 {2 


Q = 3yj+y 2 


6 . Q = 5x 2 4- 2x| +4x 2 4- 4 xiX 2 

7. (2 = 3x 2 + 4x| 4- 4x 1 x 2 — 4 x 2 X 3 


Answer: 

_2 2 1 

r n 3 3 3 

X\ 

2 1 2 

*2 = — — — 

3 3 3 

X 3 

J I 2 _2 

3 3 3 


« ; Q=y* + 4y% + 7y$ 


8. Q = 2x 2 4 - 5x| 4- 5x| f 4xiX2 —4 xix 3 — 8 x 2 X 3 

In Exercises 9-10, express the quadratic equation in the matrix form x J Ax 4 - Kx + f = 0, where x^Ax is the associated 
quadratic form and K is an appropriate matrix. 

9 ' (a) 2x 2 + xy+x-6y4=2 = 0 


(b) y 2 + 7x - Sy - 5 = 0 


Answer: 

(a) r 2 i] 

[ xy] 1 * [j]+[-16]p] + 2 = 0 




1 L (a) 2x 2 + 5y 2 = 20 

(b) x 2 — y 2 — 8 = 0 

(c) ly 2 — 2x = 0 

(d) x 2 +y 2 — 25 = 0 


Answer: 

(a) ellipse 

(b) hyperbola 

(c) parabola 

(d) circle 

12< (a) 4x 2 + 9y 2 = 1 

(b) 4x 2 - 5y 2 = 20 

(c) -x 2 = 2y 

(d) x 2 -3 = -.y 2 

In Exercises 13-16, identify the conic section represented by the equation by rotating axes to place the conic in standard 
position. Find an equation of the conic in the rotated coordinates, and find the angle of rotation. 

13. 2x 2 - 4xy-y 2 + 8 = 0 

Answer: 

Hyperbola: 2{y') 2 - l{x') 2 = 8; - 26 . 6° 

14. 5x 2 +4xy + 5y 2 = 9 

15.1 lx 2 + 24xy + 4y 2 —15 = 0 

Answer: 

Hyperbola: 4(x') 2 - (y') 2 = 3; 0 = 36.9° 

1 6- x 2 + xy+y 2 = ^ 

In Exercises 17-18, determine by inspection whether the matrix is positive definite, negative definite, indefinite, positive 
semidefinite, or negative semidefinite. 

17 - (a) r 1 o" 

.0 2 _ 

(b) r-i o' 

0 — 2 _ 



( C ) r-i o' 

0 2 . 

(d) ri oi 

_° 0. 

(e) [0 0" 

_0 — 2 _ 

Answer: 

(a) Positive definite 

(b) Negative definite 

(c) Indefinite 

(d) Positive semidefinite 

(e) Negative semidefinite 

18 -(a) [2 O' 

_0 — 5 _ 

(b) [-2 0" 

0 “ 5 . 

(c) 2 0 

.0 5_ 

(d) [0 0" 

.0 — 5 _ 

(e) 2 Ol 
_0 0. 

In Exercise 19-24, classify the quadratic form as positive definite, negative definite, indefinite, positive semidefinite, or 
negative semidefinite. 

19. E *2 
Answer: 

Positive definite 

20. —xj — 3x\ 

21 • (*1 ~^ 2) 2 
Answer: 

Positive semidefinite 
22 - -(*i -^ 2 ) 2 


Answer: 

Indefinite 

24. *1*2 

In Exercises 25-26, show that the matrix A is positive definite first by using Theorem 7.3.2 and second by using Theorem 
7.3.4. 


25 . 




5 

-2 

-2 

5_ 


(b) 

2 

-1 

O' 

A = 

-1 

2 

0 


0 

0 

5 

^ A = 

"2 r 
1 2 _ 



(b) 

3 

-1 

0 

A = 

-1 

2 

-1 


0 

-1 

3 


In Exercises 27-28, find all values of k for which the quadratic form is positive definite. 

27. 5* 2 + x\ +&X 3 + 4 * 1*2 “ 2 * 1*3 “ 2 * 2*3 
Answer: 

k > 2 

28. 3* 2 + *| + 2*| — 2 * 1*3 4- 2 £* 2*3 

29* Let x^4x be a quadratic form in the variables *i, *2 .* M , an d define T.R ” - > 7? by 7*(xJ = x‘ -4x. 

( a ) Show that + y) = 7’[x j + 2x J Uy + r(yj. 

(b) Show that r(cx) = c^T(x J 

30. Express the quadratic form (c 1 * 1 1- ^ 2*2 E... + c M * M ) 2 i n the matrix notation x J Ax, where A is symmetric. 

31. In statistics, the quantities 

* = \ (*t + *2+ —+ *m) 
and 

s? = -^ 3 y[(*l ~*) 2 + (x 2 -x) 2 + -+ (x„-x) 2 ] 

are called, respectively, the sample mean and sample variance of x = (* 1 , * 2 ,* M ). 

(a) Express the quadratic form s 2 in the matrix notation X J Ax, where A is symmetric. 

(b) Is Sj a positive definite quadratic form? Explain. 

Answer: 


(a) 1 

n 

1 

A= n{n- 1) 

1 

n (n — 1 ) 

(b) Yes 


1 

«(« — 1 ) 
1 


1 

«(«—!) 


1 

«(« — 1 ) 
1 

n {n — 1 ) 
I 


32. The graph in an xyz-coordinate system of an equation of form ax + by + cz = 1 in which a , Z?, and c are positive is a 

surface called a central ellipsoid in standard position (see the accompanying figure). This is the three-dimensional 

9 9 9 9 9 

generalization of the ellipse ax + by = 1 in the xy-plane. The intersections of the ellipsoid ax + by + cz = 1 with the 



coordinate axes determine three line segments called the axes of the ellipsoid. If a central ellipsoid is rotated about the origin 
so two or more of its axes do not coincide with any of the coordinate axes, then the resulting equation will have one or more 
cross product terms. 

(a) Show that the equation 


42. 4 „ 2 ^ O - 4 ™ ^ 4 ^ , 4 W 1 
-x + -y + -z + -xy + -xz + -yz = 1 


represents an ellipsoid, and find the lengths of its axes. [Suggestion: Write the equation in the form x^Ax = 1 an4 make 
an orthogonal change of variable to eliminate the cross product terms. 

(b) What property must a symmetric 3x3 matrix have in order for the equation x ^Ax = 1 to represent an ellipsoid? 



Figure Ex-32 

33. What property must a symmetric 2x2 matrix A have for X J Ax. = 1 to represent a circle? 


Answer: 


A must have a positive eigenvalue of multiplicity 2. 

. Prove: If £ ^ 0, then the cross product term can be e 
coordinate axes through an angle 0 that satisfies the equation 

cot 26 - 


34. Prove: If £ ^ 0, then the cross product term can be eliminated from the quadratic form ax A- 2 bxy + cy A by rotating the 

a — c 

2b 

35. Prove that if A is an ^ x n symmetric matrix all of whose eigenvalues are nonnegative, then x J Ax > 0 for all nonzero x in 

True-False Exercises 


In parts (a)-(l) determine whether the statement is true or false, and justify your answer. 

(a) A symmetric matrix with positive definite eigenvalues is positive definite. 

Answer: 

True 

(b) x j* - xj + A + Ax ix 2 X 2 is a quadratic form. 

Answer: 

False 

(c) (*1 = 3 * 2 )^ is a quadratic form. 

Answer: 

True 

(d) A positive definite matrix is invertible. 


Answer: 



True 

(e) A symmetric matrix is either positive definite, negative definite, or indefinite. 

Answer: 

False 

(f) If A is positive definite, then is negative definite. 

Answer: 

True 

(§) x ■ x is a quadratic form for all x in R n . 

Answer: 

True 

(h) If x T Ax. is a positive definite quadratic form, then so is x T A~ l x- 
Answer: 

True 

(i) If A is a matrix with only positive eigenvalues, then x^Ax is a positive definite quadratic form. 

Answer: 

False 

(j) If A is a 2 x 2 symmetric matrix with positive entries and det(^4) > 0 , then A is positive definite. 

Answer: 

True 

(k) If X J ^lx is a quadratic form with no cross product terms, then A is a diagonal matrix. 

Answer: 

False 

(l) If x T Ax is a positive defmitequadratic form in two variables and ^ t Q, then the graph of the equation x^Ax. = c i s an ellipse. 
Answer: 

False 
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7.4 Optimization Using Quadratic Forms 

Quadratic forms arise in various problems in which the maximum or minimum value of some quantity is required. 
In this section we will discuss some problems of this type. 


Constrained Extremum Problems 


Our first goal in this section is to consider the problem of finding the maximum and minimum values of a 
quadratic form x J Ax subject of the constraint x|| = 1. Problems of this type arise in a wide variety of 
applications. 


To visualize this problem geometrically in the case where x ^ Ax is a quadratic form on £ 2 , view z — x ^Ae as the 
equation of some surface in a rectangular xyz-coordinate system and view ||x|| = 1 as the unit circle centered at 
the origin of the xy-plane. Geometrically, the problem of finding the maximum and minimum values of x 1 Ax 
subject to the requirement ||x|| = 1 amounts to finding the highest and lowest points on the intersection of the 
surface with the right circular cylinder determined by the circle (Figure 7.4.1). 



Figure 7.4.1 


The following theorem, whose proof is deferred to the end of the section, is the key result for solving problems of 
this type. 


Constrained Extremum Theorem 

Let A be a symmetric ^ x n matrix whose eigenvalues in order of decreasing size are 
Ai>A2 > " • • >A„. Then: 

(a) the quadratic form x ^Ax attains a maximum value and a minimum value on the set of vectors for 
which ||x|| = 1; 

(b) the maximum value attained in part ( a ) occurs at a unit vector corresponding to the eigenvalue \\; 

(c) the minimum value attained in part (< a ) occurs at a unit vector corresponding to the eigenvalue A n . 

The condition ||x|| = 1 in this theorem is called a constraint , and the maximum or minimum value of 
x J Ax subject to the constraint is called a constrained extremum. This constraint can also be expressed as 
X T x = 1 or as ^ + x j + • • • = when convenient. 



EXAMPLE 1 Finding Constrained Extrema 


Find the maximum and minimum values of the quadratic form 

z = 5x 2 + 5 y 2 + 4 xy 

2 2 

subject to the constraint x + y = 1. 


The quadratic form can be expressed in matrix notation as 
z = 5x 2 + 5y 2 4 - 4 xy = x T Ax = 


x y 

'5 2 ' 

~x~ 


_2 5 

y 


We leave it for you to show that the eigenvalues of A are X\ = 7 and X 2 = 3 and that corresponding 
eigenvectors are 


Ai= 7 : 

Normalizing these eigenvectors yields 

Ai= 7 : 

Thus, the constrained extrema are 


, A 2 = 3: 


-1 

1 


i 


1 

f2 

, A 2 = 3: 

f2 

1 

1 

f2 




constrained maximum:z = 7 at(x, .y) = —]=, — 

1/2 { 2 } 

constrained mjnjmum z = 3 at(ar, v) = [ — 7 =, - 4 = 

l P- Pi 


a) 


Since the negatives of the eigenvectors in 1 are also unit eigenvectors, they too produce the maximum 
and minimum values of z; that is, the constrained maximum z = l also occurs at the point 


(x,y) = 


_L 

f2’ 


1 


f2 


and the constrained minimum z — 3 at (x, y) = 




_L 


EXAMPLE 2 A Constrained Extremum Problem 

2 2 

A rectangle is to be inscribed in the ellipse 4x + 9 y = 36, as shown in Figure 7.4.2.Use 
eigenvalue methods to find nonnegative values of v andy that produce the inscribed rectangle with 


maximum area. 




















i 

i* 

{x. y) 




X 









A rectangle inscribed in the ellipse Ax^ + 9y^ = 36. 


The area z of the inscribed rectangle is given by z = Axy , so the problem is to maximize 
the quadratic form z — Axy subject to the constraint Ax + 9y = 36. In this problem, the graph of 

the constraint equation is an ellipse rather than the unit circle as required in Theorem 7.4.1, but we 
can remedy this problem by rewriting the constraint as 

(tf + (t) 2 =' 

and defining new variables, x i and y \ , by the equations 

x = 3x\ and y = 2y\ 

This enables us to reformulate the problem as follows: 

maximize z = Axy = 2Ax \y \ 

subject to the constraint 

A +>'i = 1 

To solve this problem, we will write the quadratic form z = 2 Ax iv i as 

z = = 

We now leave it for you to show that the largest eigenvalue of A is \ = 12 and that the only 
corresponding unit eigenvector with nonnegative entries is 

1 


xi y i 

" 0 

12:1 

-xi' 


12 

0_ 

y i_ 


x = 


*1 
y i 


& 

_L 

f2 


Thus, the maximum area is z = 12? an d this occurs when 

3 

x = 3x i = —j= and y = 2y\ 

& 


_ 2 _ 

& 


Constrained Extrema and Level Curves 

A useful way of visualizing the behavior of a function f fa, y ) of two variables is to consider the curves in the 
xy-plane along which f y) is constant. These curves have equations of the form 

/(*, j v)=k 


















and are called the level curves of/ (Figure 7.4.3).In particular, the level curves of a quadratic form x J Ax on E? 
have equations of the form 


x r As. = k (2) 

so the maximum and minimum values of x ^Ax subject to the constraint ||x|| = 1 are the largest and smallest 
values of k for which the graph of 2 intersects the unit circle. Typically, such values of k produce level curves that 
just touch the unit circle (Figure 7.4.4), and the coordinates of the points where the level curves just touch produce 
the vectors that maximize or minimize x 1 Ax subject to the constraint ||x|| = 1. 

i k z 

z-fU y) 

/ 


Plane i = k 

% 




Figure 7.4.4 


EXAMPLE 3 Example 1 Revisited Using Level Curves 


In Example 1 (and its following remark) we found the maximum and minimum values of the 
quadratic form 

z = 5x 2 + 5y 2 + Axy 

subject to the constraint x + y = 1. We showed that the constrained maximum is z = 7, an d this is 
attained at the points 


o, 7 ) = 



and ( x,y ) 




(3) 


and that the constrained minimum z = 3> an( A this is attained at the points 

/ \ / 


\ 


j_L 

{2’ {2 


j_i_ 

{2’ {2 


0,7)= 


and 0, y) = 


(4) 







9 9 

Geometrically, this means that the level curve 5x +5y + 4 xy = 7 should just touch the unit 

2 2 

circle at the points in 3, and the level curve 5x ■+ 5y + 4xy = 3 should just touch it at the points 
in 4. All of this is consistent with Figure 7.4.5. 


i 

l y - 5 r 2 . . 

y-5.tr + 5y* + 4ry = 7 
' / 

/ 'V: ’ \T^ 4 


- + >7=1 -■» \ 

t 

(-jr'-j;)/ 

\ V2 v2 y 

\ 


5jr J +5y J + 4iry = 3 


Figure 7.4.5 


CALCULUS REQUIRED 

Relative Extrema of Functions of Two Variables 

We will conclude this section by showing how quadratic forms can be used to study characteristics of real-valued 
functions of two variables. 

Recall that if a function f y) has first-order partial derivatives, then its relative maxima and minima, if any, 
occur at points where 

fx(*> y) = 0 and f y (x,y) = 0 

These are called critical points of f. The specific behavior of/at a critical point (xq, _yg) is determined by the sign 
of 


D(x, y)=f ( x , y)—f (x 0 , y 0 ) (5) 

at points (x,y) that are close to, but different from, (xo, 7 o) : 

If D(x, y) > 0 at points (x, y) that are sufficiently close to, but different from, ^q), then 

/ (*0> 7o) < / (*, y ) at suc h P°hits and/is said to have a relative minimum at ^q) (Figure 7.4.6a). 

If D{x , y) < 0 at points (x, y) that are sufficiently close to, but different from, ^q), then 
/ Oo, 70) > / (*, y ) at suc h points and/is said to have a relative maximum at (xq, ^q) (Figure 7.4.6Z?). 

If D(x , y) has both positive and negative values inside every circle centered at (^ q , 70 ), then there are points 
(x, y ) that are arbitrarily close to (*o, 70) at which f (x 0 , 7 o) < / 7 ) anc * P°i n t s (*, 7 ) that are 

arbitrarily close to (xq, 70) at which f (x 0, 70) > / (*> >' •' * I n this case we sa Y that/has a saddle point at 
(*0, 70 ) (Figure 7.4.6c). 





Relative minimum at (0,0) 


(a) 


k z 



Relative maximum at (0,0) 

(b) 



Saddle point at (0,0) 


( c ) 

Figure 7.4.6 

In general, it can be difficult to determine the sign of 5 directly. However, the following theorem, which is proved 
in calculus, makes it possible to analyze critical points using derivatives. 


Second Derivative Test 


Suppose that ^q) is a critical point of f (x 9 y ) and that/has continuous second-order partial 
derivatives in some circular region centered at ^q). Then: 









(a) /has a relative minimum at yg) if 

fxx( x 0, .yoX/jyfOo. 70) 70) >0 and f xx (x 0 , 70) > 0 

(b) /has a relative maximum at (xq, y n ) if 

fxx(XQ, yo)fyy(XQ, y 0 ) ~ f X y(x 0 , y Q ) > 0 and f Xx( x 0. 70) < 0 

(c) /has a saddle point at (/g, y g) if 

fxx( x o. 7o)/^(^0.7o) -fx/(xo. 70) <0 

(d) The test is inconclusive if 

fxx(* 0. yo)fyy( x O, 7o) 7o) = 0 


Our interest here is in showing how to reformulate this theorem using properties of symmetric matrices. For this 
purpose we consider the symmetric matrix 


fxxix.y) fxy(x,y) 
fxy( x >y) fyy( x >y) 


which is called the Hessian or Hessian matrix of/in honor of the German mathematician and scientist Ludwig 
Otto Hesse (1811-1874). The notation H(x, y ) emphasizes that the entries in the matrix depend on x and 7 . The 
Hessian is of interest because 


det 


H(x 0 ,y Q ) 


fxxixo.yo) fxyixo.yo ) 

fxy(XQ,yo) fyy(XQ,yo) 


= fxx( x 0. yo)fyy(.X 0 , >>o) - fxy Oo. 7o) 


is the expression that appears in Theorem 7.4.2. We can now reformulate the second derivative test as follows. 


Hessian Form of the Second Derivative Test 

Suppose that (xg, y g) is a critical point of f(x,y) and that/has continuous second-order partial 
derivatives in some circular region centered at (jg, y n ). If H(x g, yg) is the Hessian of/ at (^g, yg), then: 

(a) /has a relative minimum at (xg, yg) if H(x g, yg) is positive definite. 

(b) /has a relative maximum at (xg, y n ) if H(x g, yg) is negative definite. 

(c) /has a saddle point at (*g, yg) if H{x g, yg) is indefinite. 

(d) The test is inconclusive otherwise. 


We will prove part {a). The proofs of the remaining parts will be left as exercises. 

If H(x g, yg) is positive definite, then Theorem 7.3.4 implies that the principal submatrices of 
H (xq, yg) have positive determinants. Thus, 








det[tf(x 0 , 7 o)] = 


/**0o. 70) /*/(*0, 70) 
f xy(*0 . yo) fyy(*0 - 7o) 


= /**Oo> 70)/^(^0, 70) -fxy(xo, yo) >0 


*7' 


and 


det[/ (*o. yo)] =/ xx (* 0 , 70) > 0 

so/has a relative minimum at (/q, ) by part (a) of Theorem 7.4.2. 


EXAMPLE 4 Using the Hessian to Classify Relative Extrema 

Find the critical points of the function 

/ |x, y j = yx 3 + xy 2 - 8 x 7 + 3 

and use the eigenvalues of the Hessian matrix at those points to determine which of them, if any, are 
relative maxima, relative minima, or saddle points. 

To find both the critical points and the Hessian matrix we will need to calculate the first 
and second partial derivatives of/ These derivatives are 

/,(x, 7 )=x 2 + 7 2 - 87 , f y (x, y) = 2xy-Zx, f xy (x, y) =2y-S 

fxx(x, y) = 2 x, ///(*> 7 ) = 2 x 

Thus, the Hessian matrix is 

~fxx(*.y) fxy(x.y) 

fxy(x,y) f yy (x,y) 

To find the critical points we set / x and / y equal to zero. This yields the equations 

f x(x, y) =x 2 +y 2 -Sy = 0 and /^(x, 7 ) = 2 x 7 - 8 x = 2x(y-4) = 0 

Solving the second equation yields * = Q or y = 4- Substituting x = 0 in the first equation and 
solving for y yields y — 0 or y = 8; and substituting y = 4 into the first equation and solving for x 
yields x = 4 or x = —4- Thus, we have four critical points: 

(0,0), (0,8), (4,4), (-4,4) 

Evaluating the Hessian matrix at these points yields 



( 

\ 

" 

H 

*,y 

= 


L 

) 

_ 


2 x 27 — 8 
27 — 8 2 x 


0 , 0 ) = 

H( 4, 4) = 


0 -8 
-8 0 

8 0 ^ 

0 8 


0 , 8 ) = 

H(- 4 4) = 


0 8 
8 0 

-8 0 
0 -8 


We leave it for you to find the eigenvalues of these matrices and deduce the following classifications 
of the stationary points: 


Critical Point (xo, jo) 

h 

*2 

Classification 

(0, 0) 

8 

-8 

Saddle point 

(0, 8) 

8 

-8 

Saddle point 



























Critical Point (xo, yo) 

h 

a 2 

Classification 

(4, 4) 

8 

8 

Relative minimum 

(-4, 4) 

-8 

-8 

Relative maximum 


OPTIONAL 

We conclude this section with an optional proof of Theorem 7.4.1. 

The first step in the proof is to show that Ax has constrained maximum and minimum 
values for ||x|| = 1. Since A is symmetric, the principal axes theorem (Theorem 7.3.1) implies that there is an 
orthogonal change of variable x = Py such that 


x T Ax = \ l y 2 l +\ 2 y 2 2 + • • • ( 6 ) 

in which \\, A 2 , A M are the eigenvalues of A. Let us assume that |x|| = 1 and that the column vectors of P 
(which are unit eigenvectors of A) have been ordered so that 

Ai>A 2 > • * • >A n (7) 

Since the matrix P is orthogonal, multiplication by P is length preserving, so that ||y || = ||x|| = 1; that is, 

yj +^2 + • ■ ■ -¥yn = 1 

It follows from this equation and 7 that 

A = A „{y 2 +y 2 + • • • +y 2 } < • • • +X&i 

< Al +^2 + ■ ■ ■ +7«J=^1 

and hence from 6 that 

A„ <x T Ax< Ai 

This shows that all values of x ^Ax f° r which ||x|| = 1 lie between the largest and smallest eigenvalues of A. Now 
let x be a unit eigenvector corresponding to X \. Then 

Ax = x^AixJ = Aix^x = Ai ||x || 2 = Ai 

which shows that x J Ax has Ai as a constrained maximum and that this maximum occurs if x is a unit eigenvector 
of A corresponding to Aj. Similarly, if x is a unit eigenvector corresponding to A M , then 

x T Ax = x^A„xJ = A„x^x = A„||x || 2 = A„ 

so x ^Ax has A n as a constrained minimum and this minimum occurs if x is a unit eigenvector of A corresponding 
to A m . This completes the proof. 










Concept Review 

Constraint 

Constrained extremum 
Level curve 
Critical point 
Relative minimum 
Relative maximum 
Saddle point 
Second derivative test 
Hessian matrix 

Skills 

Find the maximum and minimum values of a quadratic form subject to a constraint. 

Find the critical points of a real-valued function of two variables, and use the eigenvalues of the Hessian 
matrix at the critical points to classify them as relative maxima, relative minima, or saddle points. 


Exercise Set 7.4 

In Exercises 1-4, find the maximum and minimum values of the given quadratic form subject to the constraint 

2 2 

X-+^=l, and determine the values of x and y at which the maximum and minimum occur. 

1. 5x 2 -y 2 

Answer: 

Maximum: 5 at (1, 0) and (—1,0); minimum: _] at (0, 1) and (0, — 1) 

2. xy 

3. 3x 2 + ly 2 
Answer: 

Maximum: 7 at (0, 1) and (0, -1); minimum: 3 at (1, 0) and (-1,0) 

4. 5x 2 4- 5 xy 

In Exercises 5-6, find the maximum and minimum values of the given quadratic form subject to the constraint 

x 2 +y 2 + z 2 = 1 

and determine the values of x, y, and z at which the maximum and minimum occur. 

5. 9x 2 + 4y 2 + 3z 2 


Answer: 


Maximum: 9 at (1, 0, 0) and (-1,0, 0); minimum: 3 at (0, 0, 1) and (0, 0,-1) 

6. 2x 2 + y 2 4z 2 4- 2 xy + 2xz 

7. Use the method of Example 2 to find the maximum and minimum values of xy subject to the constraint 
4x 2 + By 2 =16. 

Answer: 

Maximum: z = 4^2 at (x f y) = (2^2, 2 J and j — 2^2, — 2 j; minimum: z = — 4^2 at 
(x,y) = (-2^2, 2J and (2^2, -2) 

8. Use the method of Example 2 to find the maximum and minimum values of x + xy + 2y subject to the 

2 2 

constraint x +3y = 16. 

In Exercises 9-10, draw the unit circle and the level curves corresponding to the given quadratic form. Show that 
the unit circle intersects each of these curves in exactly two places, label the intersection points, and verify that 
the constrained extrema occur at those points. 

9. 5 x 2 -y 2 
Answer: 



10. xy 

( a ) Show that the function / y J = 4xy — x^ — y 4 has critical points at (0, 0), (1, 1), and ( — 1, — 1). 

(b) Use the Hessian form of the second derivative test to show/has relative maxima at (1, 1) and ( — 1, — 1) 
and a saddle point at (0, 0). 

( a ) Show that the function / y J = “ y 3 has critical points at (0, 0) and ( — 2, 2). 

(b) Use the Hessian form of the second derivative test to show/has a relative maximum at ( — 2, 2) and a 
saddle point at (0, 0). 

In Exercises 10-13, find the critical points of f if any, and classify them as relative maxima, relative minima, or 
saddle points. 




13 - /0, 7 ) =x-' -3xy-y 3 
Answer: 

Critical points: (-1, 1), relative maximum; (0, 0), saddle point 

14 - / (*, y ) = - 3xy + y 3 

15 - f(x,y ] j = x 2 + 2y 2 -x 2 y 
Answer: 


Critical points: (0, 0), relative minimum; (2, 1) and (-2, 1), saddle points 

16, / (x, 7 ) = x 3 +y 3 - 3x - 3y 

17. A rectangle whose center is at the origin and whose sides are parallel to the coordinate axes is to be inscribed 
in the ellipse x + 25 y = 25. Use the method of Example 2 to find nonnegative values of v and y that 

produce the inscribed rectangle with maximum area. 

Answer: 


Comer points: x “ y ~~ 

Suppose that the temperature at a point (x t y) on a metal plate is T{x, y ) = 4x~ — 4 xy I y^. An ant, walking 

on the plate, traverses a circle of radius 5 centered at the origin. What are the highest and lowest temperatures 
encountered by the ant? 

(a) Show that the functions 

f (x,y'j = x 4 +y A and =x 4 -y 4 

have a critical point at (0, 0) but the second derivative test is inconclusive at that point. 

(b) Give a reasonable argument to show that/has a relative minimum at (0, 0) and g has a saddle point at (0, 

0 ). 


20. Suppose that the Hessian matrix of a certain quadratic form f (x, y) is 



What can you say about the location and classification of the critical points of/? 

21. Suppose that ,4 is an ^ x n symmetric matrix and 

? (x) = x r ^x 

where x is a vector in R n that is expressed in column form. What can you say about the value of q if x is a unit 
eigenvector corresponding to an eigenvalue X of A? 


Answer: 

q(x) = A 



22. Prove: If X J Ax is a quadratic form whose minimum and maximum values subject to the constraint ||x|| = 1 


are m and M, respectively, then for each number c in the interval m < c < M, there is a unit vector x c such that 
= c . [Hint: In the case where m< let n m and U M be unit eigenvectors of A such that ulAi™ = m 

C C * 771 771 

and u^An m = M, and let 


= J ¥: c + J 5 m um 

F M —m t M — m 


Show that \^Ax c = c-] 


True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) A quadratic form must have either a maximum or minimum value. 

Answer: 

False 

(b) The maximum value of a quadratic form x T Ax subject to the constraint ||x|| = 1 occurs at a unit eigenvector 
corresponding to the largest eigenvalue of^. 

Answer: 

True 

(c) The Hessian matrix of a function/with continuous second-order partial derivatives is a symmetric matrix. 
Answer: 

True 

(d) If ( X q ? ^yg) is a critical point of a function / and the Hessian of/at (xq, 7o) 0? then / has neither a relative 

maximum nor a relative minimum at (xo, ^o) * 

Answer: 

False 

(e) If A is a symmetric matrix and det A < 0? then the minimum of x J Ax subject to the constraint ||x|| = 1 is 
negative. 

Answer: 

True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 






7.5 Hermitian, Unitary, and Normal Matrices 

We know that every real symmetric matrix is orthogonally diagonalizable and that the real symmetric matrices 
are the only orthogonally diagonalizable matrices. In this section we will consider the diagonalization problem 
for complex matrices. 


Hermitian and Unitary Matrices 

The transpose operation is less important for complex matrices than for real matrices. A more useful operation 
for complex matrices is given in the following definition. 


DEFINITION 1 

If A is a complex matrix, then the conjugate transpose of A, denoted by A \ is defined by 

A* = A T (1) 


Since part ( b ) of Theorem 5.3.2 states that — (A) , the order in which the transpose and 

conjugation operations are performed in computing 4 * _ 1 does not matter. Moreover, in the case where A 

has real entries we have A = (A) = A 1 , so A * is the same as A ^ f° r real matrices. 


EXAMPLE 1 Conjugate Transpose 

Find the conjugate transpose A * of the matrix 

A 1 + i — i 0 

2 3 — 2i i 


Solution We have 




1 — i 

2 

1 — i i 0 

2 3 + 2 i —i 

* -T 

and hence A = A = 

i 

0 

3 + 2i 

— i 


The following theorem, parts of which are given as exercises, shows that the basic algebraic 
properties of the conjugate transpose operation are similar to those of the transpose (compare to 
Theorem 1.4.8). 








THEOREM 7.5.1 


If A - is a complex scalar, and if A, B, and C are complex matrices whose sizes are such that the stated 
operations can be performed, then: 

(a) 


K) 

< b > (a+b)’ =A’+B* 


(O 


[a-b) =a’-b' 


— * 

= kA 


<d) =; 

< e > (ab)*=b*a* 


Note that the relationship u • v = v ‘ u in Formula 5 of Section 5.3 can be expressed in terms of the 
conjugate transpose as 


u • V = vu 


( 2 ) 


We are now ready to define two new classes of matrices that will be important in our study of diagonalization 
in C”. 


DEFINITION 2 

A square complex matrix A is said to be unitary if 

A~ l =A 


and is said to be Hermitian 


if 



(3) 


( 4 ) 


Note that a unitary matrix can also be defined 


as a square complex matrix A for which 
* * 

AA =A A = I 


If A is a real matrix, then A* = A^, in which case 3 becomes ^ -1 — and 4 becomes A^ — A- Thus, the 
unitary matrices are complex generalizations of the real orthogonal matrices and Hermitian matrices are 
complex generalizations of the real symmetric matrices. 

EXAMPLE 2 Recognizing Hermitian Matrices 


Hermitian matrices are easy to recognize because their diagonal entries are real (why?), and the 
entries that are symmetrically positioned across the main diagonal are complex conjugates. Thus, 
for example, we can tell by inspection that 


A = 


1 


—2 


1 — 2 


2 1 4-2 
-5 2-2 

2 + 2 3 


is Hermitian. 


The fact that real symmetric matrices have real eigenvalues is a special case of the following more general 
result about Hermitian matrices, the proof of which is left for the exercises. 


THEOREM 7.5.2 

The eigenvalues of a Hermitian matrix are real numbers. 


The fact that eigenvectors from different eigenspaces of a real symmetric matrix are orthogonal is a special 
case of the following more general result about Hermitian matrices. 


THEOREM 7.5.3 

If A is a Hermitian matrix, then eigenvectors from different eigenspaces are orthogonal. 


Let vi and V 2 be eigenvectors of A corresponding to distinct eigenvalues X\ and A 2 . Using Formula 2 
and the facts that Ai = Ai, A 2 = A 2 , and A = +* we can write 




Al(v 2 • vi) = (Aivi)*v 2 


04vi)*v 2 = (v*J*)v 2 

(v^4jv 2 = vi (Av 2 ) 

vi (A 2 v 2 ) = A 2 (vj v 2 ) = A 2 (v 2 • V!) 


This implies that (Ai — A 2 ) (v 2 • vi) = 0 and hence that v 2 • vi = 0 (since Ai * A 2 ). 

EXAMPLE 3 Eigenvalues and Eigenvectors of a Hermitian Matrix 

Confirm that the Hermitian matrix 



has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. 
The characteristic polynomial of A is 


A — 2 -l-i 
— 1 +i A— 3 

(A—2)(A —3) —(—1—1)(—1+0 

(A 2 —5A + 6) —2 = (A— 1) (A — 4) 


det(A/ — A) 


so the eigenvalues of A are \ = 1 and \ = 4 , which are real. Bases for the eigenspaces of A can be obtair 
by solving the linear system 


A — 2 -1-i *1 = 0 

-1 +i A— 3 *2 “ 0 


with A = 1 and with A = 4 - We leave it for you to do this and to show that the general solutions of these 
systems are 



Thus, bases for these eigenspaces are 


A = 1: vj = 


1 


1 



The vectors vj and v 2 are orthogonal since 



and hence all scalar multiples of them are also orthogonal. 


Unitary matrices are not usually easy to recognize by inspection. However, the following analog of Theorems 
7.1.1 and 7.1.3, part of which is proved in the exercises, provides a way of ascertaining whether a matrix is 























unitary without computing its inverse. 


THEOREM 7.5.4 

If A is an ^ x n matrix with complex entries, then the following are equivalent. 

(a) A is unitary. 

(b) ||Jx|| = ||x|| for all x in C”. 

(c) As • Ay = x • y for all x and y in C n - 

(d) The column vectors of A form an orthonormal set in C n with respect to the complex Euclidean 
inner product. 

(e) The row vectors of A form an orthonormal set in C n with respect to the complex Euclidean inner 
product. 


EXAMPLE 4 A Unitary Matrix 

Use Theorem 7.5.4 to show that 

A = 

is unitary, and then find A -1 • 

We will show that the row vectors 



1 


21 j 


1 if-l+il 

_ 2 l J 

2 \ J 


ri = ^(t+0 +0 and r 2 = ^(l-i) ^(-1+i) 

are orthonormal. The relevant computations are 

f 


Hull 

llrill = 

r l ' r 2 = 


I 


^(1+0 




2 (1 +i ) 


^(- 1+0 


=1 /I7I=1 
V 2 2 

~ = JU 1 = 

V 2 2 


(2° +i) )(F^] + (2° + 0 )( 2 (_1 +0 ) 

= (2 C1+0 )(2 (1+0 ) + (2 (1+0 )(2 ( " 1 "°) = 2 i "2 i = 0 

Since we now know that A is unitary, it follows that 





















1 

2 



1 

2 



You can confirm the validity of this result by showing that AA * = A * A = /• 


Unitary Diagonalizability 


Since unitary matrices are the complex analogs of the real orthogonal matrices, the following definition is a 
natural generalization of orthogonal diagonalizability for real matrices. 


DEFINITION 3 


A square complex matrix is said to be unitarily diagonalizable if there is a unitary matrix P such that 
p * AP = £} is a complex diagonal matrix. Any such matrix P is said to unitarily diagonalize A. 


J 


Recall that a real symmetric n x n matrix A has an orthonormal set of n eigenvectors and is orthogonally 
diagonalized by any n x n matrix whose column vectors are an orthonormal set of eigenvectors of A. Here is 
the complex analog of that result. 

THEOREM 7.5.5 

Every ^ x n Hermitian matrix A has an orthonormal set of n eigenvectors and is unitarily diagonalized 
by any n x n matrix P whose column vectors form an orthonormal set of eigenvectors of A. 

The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same as that for orthogonally 
diagonalizing a symmetric matrix: 


Unitarily Diagonalizing a Hermitian Matrix 


Step 1. Find a basis for each eigenspace of A. 

Step 2. Apply the Gram-Schmidt process to each of these bases to obtain orthonormal bases for the 
eigenspaces. 




Step 3. Form the matrix P whose column vectors are the basis vectors obtained in Step 2. This will 
be a unitary matrix (Theorem 7.5.4) and will unitarily diagonalize A. 


EXAMPLE 5 Unitary Diagonalization of a Hermitian Matrix 

Find a matrix P that unitarily diagonalizes the Flermitian matrix 



We showed in Example 3 that the eigenvalues of A are \ = ] and A = 4 and that bases 
for the corresponding eigenspaces are 


A = 1:vi = 



and A = 4 : V 2 


1 


Since each eigenspace has only one basis vector, the Gram-Schmidt process is simply a matter of 
normalizing these basis vectors. We leave it for you to show that 



— 1 — i 


‘ 1+i ‘ 

Pl_ llvill “ 

i 

“ d P2= IM = 

fe 

2 




fs 


Thus, A is unitarily diagonalized by the matrix 


P= [Pi P2] = 


— 1 — i 1 -Ft 

{l f(, 

J_ JL 


Although it is a little tedious, you may want to check this result by showing that 

— 1 ±i J_ 

fi A 

1 —i 2 


P AP = 


\ j ~6 /6 



' -1 —i 

1 Ml 



2 1 + i 

A 

ft 


'1 O' 

J- J 3 

1 

2 


_0 4 


. ^ 

f6 




Skew-Symmetric and Skew-Hermitian Matrices 

In Exercise 37 of Section 1.7 we defined a square matrix with real entries to be skew-symmetric if Jtj = — A- 
A skew-symmetric matrix must have zeros on the main diagonal (why?), and each entry off the main diagonal 
































must be the negative of its mirror image about the main diagonal. Here is an example. 


A = 



We leave it for you to confirm that A^ = 


1 

0 

-4 

A- 


-2 

4 

0 


[ skew — symmetric ] 


The complex analogs of the skew-symmetric matrices are the matrices for which A * = — A ■ Such matrices are 
said to be skew-Hermitian. 


Since a skew-Hermitian matrix A has the property 



it must be that A has zeros or pure imaginary numbers on the main diagonal (why?), and that the complex 
conjugate of each entry off the main diagonal is the negative of its mirror image about the main diagonal. Here 
is an example. 


A = 


i 1 — i 5 
—1 — t 2 i i 
-5 i 0 


[ skew — Hermitian ] 


Normal Matrices 

Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. For example, we 

know that real symmetric matrices are orthogonally diagonalizable and Hermitian matrices are unitarily 

diagonalizable. However, whereas the real symmetric matrices are the only orthogonally diagonalizable 

matrices, the Hermitian matrices do not constitute the entire class of unitarily diagonalizable complex matrices; 

that is, there exist unitarily diagonalizable matrices that are not Hermitian. Specifically, it can be proved that a 

square complex matrix A is unitarily diagonalizable if and only if 

* * 

AA =A A 

Matrices with this property are said to be normal. Normal matrices include the Hermitian, skew-Hermitian, 
and unitary matrices in the complex case and the symmetric, skew-symmetric, and orthogonal matrices in the 
real case. The nonzero skew-symmetric matrices are particularly interesting because they are examples of real 
matrices that are not orthogonally diagonalizable but are unitarily diagonalizable. 


A Comparison of Eigenvalues 

We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask you to show that the 
eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary (have real part of zero) and that the 
eigenvalues of unitary matrices have modulus 1. These ideas are illustrated schematically in Figure 7.5.1. 






Pure imaginary 
eigenvalues 
(skew-Hermitian) 

|A| = 1 (unitary) 

1 * 

-► 

Real eigenvalues 
(Hermitian) 


Figure 7.5.1 


Concept Review 

Conjugate transpose 
Unitary matrix 
Hermitian matrix 
Unitarily diagonalizable matrix 
Skew-symmetric matrix 
Skew-Hermitian matrix 
Normal matrix 

Skills 

Find the conjugate transpose of a matrix. 

Be able to identify Hermitian matrices. 

Find the inverse of a unitary matrix. 

Find a unitary matrix that diagonalizes a Hermitian matrix. 


Exercise Set 7.5 

In Exercises 1-2, find A*. 


1. 2i 1 — i 

A= 4 3 +i 

5 + i 0 


Answer: 


A 


* 


-2 i 4 5-i 

1 i 3 — i 0 









2 i 1 — i — 1 + i 

4 5 — li -i 

In Exercises 3-4, substitute numbers for the x's so that A is Hermitian. 

3. 1 i 2 — 3i 

A= x -3 1 

x x 2 

Answer: 

1 i 2 — 3i 

A= -i -3 1 

2 + 3 i 1 2 

4 . [ 2 0 3 + 5j 

A= x —4 —i 

x x 6 

In Exercises 5-6, show that A is not Hermitian for any choice of the x's. 

5 - (a) 1 i 2 — 3i 

A= -i -3 x 

2 — 3i x x 

(b) [ x x 3 + 5* 

A= 0 i —i 

3 — 5 i i x 

Answer: 





In Exercises 9-12, show that A is unitary, and find J\ *. 



Answer: 



/3 fe 


In Exercises 13-18, find a unitary matrix P that diagonalizes the Hermitian matrix A, and determine p 1 AP. 



Answer: 



— 1 + i 



& 


1 — i 


ft 
2 _ 
ft 


14. 


A = 


3 

i 


—i 

3 


15. 


A = 


6 

2-2 i 


2 + 2 i 
4 


D = 


3 0 
0 6 


Answer: 



fe 


1 l-i 


/3 


D = 


2 0 
0 8 


16. 


A = 


17. 


A = 


0 3 + j 

3-i -3 

5 0 

0 -1 
0 -l-i 


0 

1 +i 
0 


Answer: 


P = 



1 I i 

ft 


0 1 




-2 0 0 
0 1 0 
0 0 5 


18. 


2 


A = 


f2 

f2 


■U 

l2 

2 

0 


"+ 1 

{2 

0 


In Exercises 19-20, substitute numbers for the x 's so that A is skew-Hermitian. 

19. 0 i 2 — 3i 

A= x 0 1 

xx 4i 


Answer: 



20 . 


A = 


A = 


0 : 2 — 3 : 

i 0 1 

-2-3: -1 4: 

0 0 3-5:' 

x 0 —i 

xx 0 


In Exercises 21-22, show that A is not skew-Hermitian for any choice of the x's. 


21 . 


(a) 


A = 


(b) 


A = 


0 : 2 — 3 : 

—i Ox 

2 + 3: x x 

1 x 3-5: 

x 2: — : 

—3 + 5: i 3 i 


Answer: 


(a) a 13* -«31 

(b) fl ll * -^TT 


22 . 


(a) 


A = 


(b) 


A = 


i x 2 — 3i 

x 0 1 +i 

2 + 3i — 1 — i x 

0 — i 4 + 7i 

x Ox 
-4-7: x 1 


In Exercises 23-24, verify that the eigenvalues of the skew-Hermitian matrix A are pure imaginary numbers. 


23. 


A = 


24. 


A = 


0 —1 +: 

1 4 -: : 

0 3 i 
3 i 0 


In Exercises 25-26, show that A is normal. 


25. 

1 =H 2i 

2 + i 

-2-i 

A = 

2 + i 

1 +i 

—: 


—2 “i 

—i 

1 +: 

26. 

2 + 2 i 


1 -: 

A = 

i 

“2i 

1-3: 


1 -X 

1 —3j 

-3 + 8: 


27. Show that the matrix 



28. 

29. 


30. 


31. 


32. 


33. 


34. 

35. 


36. 


37. 


38. 


39. 


is unitary for all real values of 0. [Note: See Formula 17 in Appendix B for the definition of 

Prove that each entry on the main diagonal of a skew-FIermitian matrix is either zero or a pure imaginary 
number. 

Let A be any nxn matrix with complex entries, and define the matrices B and C to be 

M(^*) 

(a) Show that B and C are Flermitian. 

(b) Show that A = B + iC and A* = B- iC- 

(c) What condition must B and C satisfy for A to be normal? 

Answer: 


(c) B and C must commute. 

Show that if A is an ^ x n matrix with complex entries, and if u and v are vectors in C n that are expressed 
in column form, then 

* * 

.du • v = u • A v and u • A\ = A u • v 

Show that if A is a unitary matrix, then so is A * ■ 

Show that the eigenvalues of a skew-FIermitian matrix are either zero or purely imaginary. 

Show that the eigenvalues of a unitary matrix have modulus 1. 

Show that if u is a nonzero vector in C n that is expressed in column form, then p — uu * is Flermitian. 

Show that if u is a unit vector in C” that is expressed in column form, then H = l — 2uu is Flermitian and 
unitary. 

What can you say about the inverse of a matrix A that is both Flermitian and unitary? 

Find a 2 x 2 matrix that is both Hermitian and unitary and whose entries are not all real numbers. 


Answer: 


\[2 \[2 

i _1_ 

{2 {2 


Under what conditions is the following matrix normal? 

~a 0 0 
A =|0 0 c 
0 b 0 


What geometric interpretations might you reasonably give to multiplication by the matrices p — uu and 



H = 1 — 2uu* in Exercises 34 and 35? 


Answer: 

Multiplication of x by P corresponds to ||u| “ times the orthogonal projection of x onto W = span {u} . If 
||u|| = 1, then multiplications of x by H — / _ 2uu corresponds to reflection of x about the hyperplane u 


40. 


Prove that if A is an invertible matrix, then A * is invertible, and [A J = (a *) . 

41* (a) Prove that (A) = det(^4). 

(b) Use the result in part (a) and the fact that a square matrix and its transpose have the same determinant 
to prove that det (a J = det(.i4). 


42. Use part (b) of Exercise 41 to prove: 

(a) If A is Hermitian, then dct( 4 ) is real. 

(b) If A is unitary, then |det(j4) | = 1. 

43. Use properties of the transpose and complex conjugate to prove parts ( a ) and (e) of Theorem 7.5.1. 

44. Use properties of the transpose and complex conjugate to prove parts ( b ) and (d) of Theorem 7.5.1. 

45. Prove that an « x n matrix with complex entries is unitary if and only if the columns of A form an 
orthonormal set in C n - 

46. Prove that the eigenvalues of a Hermitian matrix are real. 

True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) 


The matrix 

Answer: 

False 


0 i 
i 2 


is Hermitian. 


(b) 


The matrix 


{2 {l {3 

0_J_ 

fe f3 

i i i 

{2 {I {2 


is unitary. 


Answer: 

False 

(c) The conjugate transpose of a unitary matrix is unitary. 










Answer: 


True 

(d) Every unitarily diagonalizable matrix is Hermitian. 

Answer: 

False 

(e) A positive integer power of a skew-Hermitian matrix is skew-Hermitian. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Supplementary Exercises 


1. Verify that each matrix is orthogonal, and find its inverse. 


(a) 


(b) 


4 

'5 

3 

5 


4 

0 

3 

5 

5 

9 

4 

12 

25 

5 

25 

12 

3 

16 

25 

5 

25 


Answer: 


(a) 


5 

4 

5 


4 

5 
3 
5 


l” 1 

3 4' 


5 5 


4 3 


5 5 


5 

_9_ 

25 

U 

25 


0 

3' 

-1 

4 

9 

12' 

5 


5 

25 

25 

4 

12 


0 

4 

3 

5 

25 


5 

5 

3 

16 


3 

12 

16 

5 

25 


5 

25 

25 


2. Prove: If Q is an orthogonal matrix, then each entry of Q is the same as its cofactor if det(0 = 1 and is 
the negative of its cofactor if det(0 = — 1. 

3. Prove that if A is a positive definite symmetric matrix, and if u and v vectors in R n in column form, then 

(u, v} = u 7 '^4v 

is an inner product on R n . 

4. Find the characteristic polynomial and the dimensions of the eigenspaces of the symmetric matrix 

'3 2 2' 

2 3 2 
2 2 3 


5. Find a matrix P that orthogonally diagonalizes 


A = 


1 0 1 
0 1 0 
1 0 1 


and determine the diagonal matrix p _ p Tj^p. 


















Answer: 


_L_ _L_ 0 

]f 2 To 0 O' 

P= 0 0 1; P t AP= 0 2 0 

J_ J_ n 001 

fl f2 j 

6. Express each quadratic form in the matrix notation x ^ Ax- 

(a) — 4 xj 4 - 16^2 — 1 . 5 x\X 2 

(b) 9 xj — Xj 4 - 4 x| 4 - 6x1x2 — 8x1x3 4 - X2X3 

7 . Classify the quadradic form 

x 2 — 3xiX2 + 4 x| 

as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 

Answer: 
positive definite 

8. Find an orthogonal change of variable that eliminates the cross product terms in each quadratic form, and 
express the quadratic form in terms of the new variables. 

(a) - 3 xf 4 - 5x3 4 - 2xiX2 

(b) — 5 x^ 4 - x| — X3 4 - 6x1x3 4 - 4xiX2 

9 . Identify the type of conic section represented by each equation. 

(a) y — x 2 = 0 

(b) 3 x — 1 ly 2 = 0 

Answer: 

(a) parabola 

(b) parabola 

10 . Find a unitary matrix U that diagonalizes 

lfl 1 O' 

A= Oil 
1 0 1 

and determine the diagonal matrix £) = JJ~^AU- 

11 . Show that if U is an n x n unitary matrix and 

M = M= ‘ = ¥n\ = 1 


then the product 



z\ 0 0 • ■ • 0 

y 0 Z2 0 • ■ • 0 

0 0 0 • • • z n 

is also unitary. 

12 . Suppose that A* = — A- 

(a) Show that iA is Hermitian. 

(b) Show that A is unitarily diagonalizable and has pure imaginary eigenvalues. 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 
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Linear Transformations 


CHAPTER CONTENTS 

General Linear Transformations 
Isomorphism 

Compositions and Inverse Transformations 
Matrices for General Linear Transformations 
Similarity 


INTRODUCTION 

In Section 4.9 and Section 4.10 we studied linear transformations from R n to R m . In this 
chapter we will define and study linear transformations from a general vector space V to a 
general vector space W. The results we obtain here have important applications in physics, 
engineering, and various branches of mathematics. 
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8.1 General Linear Transformations 

Up to now our study of linear transformations has focused on transformations from R n to R™. In this section we 
will turn our attention to linear transformations involving general vector spaces. We will illustrate ways in which 
such transformations arise, and we will establish a fundamental relationship between general ^-dimensional vector 
spaces and R n . 


Definitions and Terminology 

In Section 4.9 we defined a matrix transformation T^.R n —► R m to be a mapping of the form 

T a (x)=Ax 

in which A is an m x n matrix. We subsequently established in Theorem 4.10.2 and Theorem 4.10.3 that the matrix 
transformations are precisely the linear transformations from R n to R m 9 that is, the transformations with the 
linearity properties 

r(u + v) = T(u) + r(v) and T(ku)=kT(n) 

We will use these two properties as the starting point for defining more general linear transformations. 


DEFINITION 1 

If X. V —► W is a function from a vector space V to a vector space W , then T is called a linear 
transformation from V to W if the following two properties hold for all vectors u and v in V and for all 
scalars k\ 

(i) T(ba) = kT(u) [Homogeneity property] 

(ii) 7(u + v) = 7(u) + T(v) [Additivity property] 

In the special case where y — the linear transformation T is called a linear operator on the vector space 
V. 


J 


The homogeneity and additivity properties of a linear transformation J 9 - y _► W can be used in combination to 
show that if v\ and V2 are vectors in V and fci and ^2 are any scalars, then 

7X*ivi +&2V2) =*l7Xvi) +k2T(v2) 

More generally, if vi, V 2 ,v r are vectors in V and fcj, kj, k r are any scalars, then 

7X*ivi+* 2 V2+ - - =A:i7 , (vi)+it2^(v2) + • • • +^T(v r ) (1) 


The following theorem is an analog of parts ( a ) and (d) of Theorem 4.9.1. 


THEOREM 8.1.1 


If J’: V —► W is a linear transformation, then: 


(a) 7(0) =0. 

(b) 7(u — v) = 7(u) — 7(v) for all u and v in V. 


Let u be any vector in V. Since Qu = 0? it follows from the homogeneity property in Definition 1 that 


which proves (a). 


T(0) = T(0u)=0T(u)=0 


We can prove part ( b ) by rewriting 7(u — v) as 

7(u —v) = 7(u+(-l)v) 

= r(u) + (- i)r(v) 

= T(U)-T(V) 

We leave it for you to justify each step. 


Use the two parts of Theorem 8.1.1 to prove that 
T(-v)= -v 

for all v in V. 


EXAMPLE 1 Matrix Transformations 

Because we have based the definition of a general linear transformation on the homogeneity and 
additivity properties of matrix transformations , it follows that a matrix transformation Tj{. R n —»R™ is 
also a linear transformation in this more general sense with V = R n and W = R™- 


EXAMPLE 2 The Zero Transformation 


Let V and W be any two vector spaces. The mapping 7 ; y _► W such that 7(v) = 0 for every v in Lis a 
linear transformation called the zero transformation. To see that T is linear, observe that 
7(u + v) = 0, T(u) = 0, 7(v) = 0, and T(Jta) = 0 


Therefore, 


7(u + v) = 7(u) + 7(v) and T(hi)=kT(u) 


EXAMPLE 3 The Identity Operator 

Let Lbe any vector space. The mapping [-y > V defined by /(v) = v is called the identity operator on 
V. We will leave it for you to verify that / is linear. 


EXAMPLE 4 Dilation and Contraction Operators 


If V is a vector space and k is any scalar, then the mapping X: V — ► V given by 7Xx) = kx is a linear 
operator on V, for if c is any scalar and if u and v are any vectors in V, then 

T(cu) =k(cu) = c(ku) =cT( u) 

T{ u + v) = k{ u + v) = hi + kv = T(u) + T(v) 

If 0 < k < 1 5 then T is called the contraction of V with factor k , and if k > 1, it is called the dilation of V 
with factor k (Figure 8 . 1 . 1 ). 



Figure 8.1.1 


EXAMPLE 5 A Linear Transformation from P n to Pn + i 

Let p = p(x) = cq-(-cix4- • • • + be a polynomial in P n , and define the transformation 

T'-Pn->P»+1 b y 

7(p j = T^p(x) J = xp(x) =cqx -¥c\x 2 + • • ■ +c M x ” +1 

This transformation is linear because for any scalar k and any polynomials P l and P2 in P n we have 
T(kp) = T(kp(x)) =x(kp(x)) =k(xp(x)) =kT( p) 

and 

^(Pl+P2) = T(j>i(x) +P2(x)) =x(pi(x) +P2(x)) 

= xpi(x)+xp 2 (x) = T(vi) + T(v2) 


EXAMPLE 6 A Linear Transformation Using an Inner Product 

Let Lbe an inner product space, let vq be any fixed vector in V, and let X: V — »R be the transformation 

7(x) = (x, vq J 

that maps a vector x into its inner product with vq. This transformation is linear, for if k is any scalar, and 
if u and v are any vectors in V. then it follows from properties of inner products that 

7(&u) = vo} = vo} = £7(u) 

7(u +v) =(u + v, vq} = (u, vq} + (v, vq} = 7(u) + 7(v) 


EXAMPLE 7 Transformations on Matrix Spaces 




Let Myin be the vector space of ^ x n matrices. In each part determine whether the transformation is 
linear. 

(a) TiiA^A 7 

(b) T 2 (A) = det(^) 

Solution 

It follows from parts ( b ) and (cl) of Theorem 1.4.8 that 

T i (yh4) = (kA) 7 = kA T = kTi(Aj 

Ti(a+b} = (a+b) t =a 7 +b 7 =t 1 {a} + t [ {b} 

so T\ is linear. 

It follows from Formula 1 of Section 2.3 that 

T 2 {kA) = det {kA) = *”det {A) = k n T 2 {A) 

Thus, T 2 is not homogeneous and hence not linear if n > 1 • Note that additivity also fails 
because we showed in Example 1 of Section 2.3 that det(.d + B) and det(^4) + det (5) are not 
generally equal. 


EXAMPLE 8 Translation Is Not Linear 

Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property is useful for 
identifying transformations that are not linear. For example, if xq is a fixed nonzero vector in then 
the transformation 

7(x) =X + XQ 

has the geometric effect of translating each point x in a direction parallel to xq through a distance of 
||xq|| (Figure 8.1.2). This cannot be a linear transformation since 7X0) = xq, so T does not map 0 to 0. 



7(x)=x4xo translates each point x along a line parallel to xq through a distance 

ll*oll- 


EXAMPLE 9 The Evaluation Transformation 






Let Fbe a subspace of F( — 00 , 00 ), let 

xi,X2,...,x„ 

be distinct real numbers, and let TV—* R n be the transformation 

T(f) = (f(x { ),f(x 2 ),...,f(x„)) (2) 

that associates with / the ^-tuple of function values at x\, X 2 , Xn- We call this the evaluation 

transformation on Fat x\, X 2 , Thus, for example, if 

XI = - 1, *2 = 2, *3=4 

and if / (x) = x 2 — 1 , then 

T(f) = (/(xi). /(x 2 ), /(x 3 )) = (0, 3, 15) 


The evaluation transformation in 2 is linear, for if k is any scalar, and if f and g are any functions in V, 
then 

T(kf) = ((*/)(*!). (*/)(x 2 ).(*/)(*„)) 

= (^/(xi), A:/ (x 2 ). kf (x M )) 

= *(/(*!),/(x 2 )./(x„))=*T(/) 

and 

T(J+g) = «/+g)(*i).(/+g)(* 2 ).(/+g)(x M )) 

= (/(*l) +g(*l)./(*2) + g(*2>. /(*m) + g(*n)) 

= (/(*l)>/(*2)./(*«)) + (g(x\),g(x 2 ),...,g(x„)) 

= r(/) + T(g) 


Finding Linear Transformations from Images of Basis Vectors 

We saw in Formula ( 12 ) of Section 4.9 that if T\R n —► Z?™ is a matrix transformation, say multiplication by A, and 
if ei, e 2 , - e„ are the standard basis vectors for R n , then A can be expressed as 

A=[T(e { )\T(e 2 )\- ■ • |T(e„)] 

It follows from this that the image of any vector v = (c \ , ^ 2 , -. c n ) in R n under multiplication by A can be 
expressed as 

T(?) =c\T{e{) +c 2 T(e 2 ) + • • • +c„T(e„) 

This formula tells us that for a matrix transformation the image of any vector is expressible as a linear combination 
of the images of the standard basis vectors. This is a special case of the following more general result. 


THEOREM 8.1.2 


Let T\V —*W be a linear transformation, where V is finite dimensional. If S = { vi, V2, - -v„ } is a basis 


for V, then the image of any vector v in V can be expressed as 


T(v) =ci7’(vi) +c 2 T(v 2 ) + • • • +c m 7(v„) 


( 3 ) 


where ci, c 2 , c n are the coefficients required to express v as a linear combination of the vectors in S. 


Express v as v = c \ vi H- C 2 V 2 + " " " + c n Y n and use the linearity of T. 


EXAMPLE 10 Computing with Images of Basis Vectors 

Consider the basis S = (vi,v 2 , V 3 } lor where 

vi = (1.1.1). v 2 = (U.O). v 3 = 0.0.0) 

Let 7 * R J ^ be the linear transformation for which 

Tiy 1 ) = (1, 0), 7(v 2 ) = (2, - 1), 7(v 3 ) = (4, 3) 

Find a formula for 7(xi, x 2 , x 3 ), and then use that formula to compute 7(2, — 3, 5). 


We first need to express x = (xj, x 2 , x 3 ) as a linear combination of vj. v 2 , and v 3 . If we 

write 

(* 1 . * 2 . *3) =ci(l. 1.1) +C20.1. 0) +c 3 (l. 0 , 0 ) 

then on equating corresponding components, we obtain 

C\+C2 + C2 = *i 

c\ +C2 = X2 

ci = X 3 

which yields c\ = x 3 , C 2 = *2 _ x 2, c 2 = x l — x 2 , so 

O 1 .x 2 .x 3 ) = x 3 (l, 1,1) + (x 2 -x 3 )(l, 1,0) + (xi-X 2 )(l, 0, 0) 

= x 3 vj + (x 2 -x 3 )v 2 + (X! -X 2 )v 3 


Thus 


7(xi,x 2 ,x 3 ) 


From this formula, we obtain 


x 3 7(vi) + (x 2 -x 3 )7(v 2 ) + (xi — x 2 )7(v 3 ) 
x 3 (l, 0) + (x 2 -x 3 )(2, - 1) + (xi -x 2 )(4, 3) 
(4xi -2 x 2 -x 3 , 3xi -4x2 + x 3 ) 


7(2, -3, 5) = (9,23) 


CALCULUS REQUIRED 

EXAMPLE 11 A Linear Transformation from C^- 00 , °°) to F(-°°, °°) 

Let V = C * | — 00 , 00 J be the vector space of functions with continuous first derivatives on ( — 00 , 00 ) , and let 

W = F ( — 00 , 00 ) be the vector space of all real-valued functions defined on ( — oo, oo) . Let £); Y _► W be the 
transformation that maps a function f = / (x) into its derivative—that is, 


2(f)=/'(*) 


From the properties of differentiation, we have 

D(f + g) = D(H) = kD({) and £>(f)+£>(g) 
Thus, D is a linear transformation. 


CALCULUS REQUIRED 

EXAMPLE 12 An Integral Transformation 


Let V = C( — oo, oq) be the vector space of continuous functions on the interval ( — oo, 00), let 
W= — 00, 00 j be the vector space of functions with continuous first derivatives on ( — 00, 00), and 

let J\ V — ► W be the transformation that maps a function/in V into 

2 

For example, if f (x) = x , then 


j(f) 


-L 


*2 t 3 
rdt = l — 


The transformation J- y —» is linear, for if k is any constant, and if f and g are any functions in V. then 

properties of the integral imply that 


w)=fkf 0 wt=k r / 0 odt=MU) 

Jo Jo 




-r 

Jo 


t f(t)+g(t))dt 


= f/(t)dt+ f 

Jo Jo 


/ g(t)dt = J(/)+J(g) 


Kernel and Range 

Recall that if A is an ^ x n matrix, then the null space of A consists of all vectors x in R n such that Ax = 0 ? an d by 
Theorem 4 . 7.1 the column space of ,4 consists of all vectors b in R m for which there is at least one vector x in R n 
such that Ax = b- From the viewpoint of matrix transformations, the null space of A consists of all vectors in R n 
that multiplication by A maps into 0, and the column space of A consists of all vectors in R m that are images of at 
least one vector in R n under multiplication by A. The following definition extends these ideas to general linear 
transformations. 


DEFINITION 2 

If J 7 ; y _► is a linear transformation, then the set of vectors in V that T maps into 0 is called the kernel of 
T and is denoted by ker(/). The set of all vectors in W that are images under T of at least one vector in V is 
called the range of T and is denoted by R(t ). 



J 


EXAMPLE 13 Kernel and Range of a Matrix Transformation 

If Tj{. R n —► R™ is multiplication by the ^ x n matrix A, then, as discussed above, the kernel of T is 
the null space of A, and the range of Tj\ is the column space of A. 


EXAMPLE 14 Kernel and Range of the Zero Transformation 

Let TV—+W be the zero transformation. Since T maps every vector in V into 0, it follows that 
ker(^) = V . Moreover, since 0 is the only image under T of vectors in V, it follows that R(t) = {0} . 


EXAMPLE 15 Kernel and Range of the Identity Operator 

Let /; V _► V be the identity operator. Since /(v) = v for all vectors in V, every vector in V is the image 
of some vector (namely, itself); thus R(l) = V . Since the only vector that / maps into 0 is 0, it follows 
that ker(/) = (0} . 


EXAMPLE 16 Kernel and Range of an Orthogonal Projection 


As illustrated in Figure 8.1.3a, the points that T maps into 0 = (0, 0, 0) are precisely those on the z-axis, 
so ker(^) is the set of points of the form (0, 0, z). As illustrated in Figure 8.1.36, T maps the points in 
to the xy-plane, where each point in that plane is the image of each point on the vertical line above it. 
Thus, R(t) is the set of points of the form (A, y f 0) • 


A- 


# ( 0 . 0 .;) 



(a) ker( T) is the c-axis. 



Figure 8.1.3 


EXAMPLE 17 Kernel and Range of a Rotation 

Let f g} be, the linear operator that rotates each vector in the AT-plane through the angle (Figure 

8.1.4). Since every vector in the xy-plane can be obtained by rotating some vector through the angle 0, it 
follows that R(t) = R . Moreover, the only vector that rotates into 0 is 0, so ker(^) = (0 } . 
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Figure 8.1.4 


CALCULUS REQUIRED 

EXAMPLE 18 Kernel of a Differentiation Transformation 

Let V = C 1 j — oo, oo} be the vector space of functions with continuous first derivatives on ( — oo, oo), 

let W = F ( — oo, oo) be the vector space of all real-valued functions defined on ( — oo, oo), and let 
£) ; ^_>FFbethe differentiation transformation D (f ^ = / r (x ). The kernel of D is the set of functions in 
V with derivative zero. From calculus, this is the set of constant functions on ( — oo, oo). 


Properties of Kernel and Range 

In all of the preceding examples, ker(^) and R(t) turned out to be subspaces. In Example 14, Example 15, and 
Example 17 they were either the zero subspace or the entire vector space. In Example 16 the kernel was a line 
through the origin, and the range was a plane through the origin, both of which are subspaces of/?-'. All of this is a 
consequence of the following general theorem. 


THEOREM 8.1.3 

If T y _► is a linear transformation, then: 

(a) The kernel of T is a subspace of V. 

(b) The range of T is a subspace of W. 


To show that ker(^) is a subspace, we must show that it contains at least one vector and is closed under 
addition and scalar multiplication. By part ( a ) of Theorem 8.1.1, the vector 0 is in ker(0, so the kernel contains at 
least one vector. Let vi and V2 be vectors in ker(T) , and let k be any scalar. Then 

T(y\ + V2) = T(vi) + T(v 2 ) =04-0=0 


so vi 4-v 2 is in ker(^). Also, 


T(kv { )=kT(v { )=kO = 0 




so £vi is in ker(£). 


Proof (b) To show that R(t) is a subspace of W, we must show that it contains at least one vector and is closed 
under addition and scalar multiplication. However, it contains at least the zero vector of W since 7(0) = (0) by 
part (a) of Theorem 8.1.1. To prove that it is closed under addition and scalar multiplication, we must show that if 
and W 2 are vectors in R(t) , and if k is any scalar, then there exist vectors a and b in V for which 

T(a)=wi+W 2 and 7(b) =Awi (4) 

But the fact w\ and ^2 are in R(t) tells us that there exist vectors v\ and V2 in V such that 

7(vi) =wi and 7(v 2 ) = w 2 

The following computations complete the proof by showing that the vectors a = vi + V 2 and b =kv\ satisfy the 
equations in 4: 

7( a) = 7(v i + V 2 ) = T(vi) 4- 7(v 2 ) =wi + W 2 
7(b) = T(kv 1 ) =£7(vi) =kw\ 

CALCULUS REQUIRED 

EXAMPLE 19 Application to Differential Equations 

Differential equations of the form 

y ff + = 0 ^ a positive constant J (5) 

arise in the study of vibrations. The set of all solutions of this equation on the interval ( — 00 , 00 ) is the 
kernel of the linear transformation £•': C“ ( — 00 , 00 J —► — 00 , given by 

D(y)=y" + u 2 y 

It is proved in standard textbooks on differential equations that the kernel is a two-dimensional subspace 
ofC 2 (-oo, 00 J , so that if we can find two linearly independent solutions of 5, then all other solutions 

can be expressed as linear combinations of those two. We leave it for you to confirm by differentiating 
that 

y\=cosiA>x and 72 = sin utx 

are solutions of 5. These functions are linearly independent since neither is a scalar multiple of the other, 
and thus 


y = c\cos mx + c 2 smMX (6) 

is a “general solution” of 5 in the sense that every choice of c\ and c 2 produces a solution, and every 
solution is of this form. 


Rank and Nullity of Linear Transformations 


In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an mxn matrix, and in Theorem 4.8.2, 
which we called the Dimension Theorem , we proved that the sum of the rank and nullity is n. We will show next 
that this result is a special case of a more general result about linear transformations. We start with the following 
definition. 


DEFINITION 3 

Let T. V —» W be a linear transformation. If the range of T is finite-dimensional, then its dimension is called 
the rank of T ; and if the kernel of T is finite-dimensional, then its dimension is called the nullity of T. The 
rank of T is denoted by rank(^) and the nullity of T by nullity (t ). 


The following theorem, whose proof is optional, generalizes Theorem 4.8.2. 


Dimension Theorem for Linear Transformations 

If T y _► is a linear transformation from an ^-dimensional vector space V to a vector space W, then 

rank(T) + nullity(T) = n ( 7 ) 


In the special case where A is an m x n matrix and Ta R n —> R m is multiplication by A, the kernel of 7^ is the null 
space of A, and the range of Tj\ is the column space of A. Thus, it follows from Theorem 8 .1.4 that 

rank(T 4 - nuUity(T a) =n 


OPTIONAL 


torem 8.1.4 We must show that 


dim (R(t)) + dim(ker(7)) =n 

We will give the proof for the case where 1 < dim(ker(2)) < n. The cases where dim(ker(7)) = 0 and 
dim(ker(7)) = n are left as exercises. Assume dim(ker(7)) = A and let vj, v r be a basis for the kernel. Since 
{v\, v r } is linearly independent, Theorem 4.5.56 states that there are ,>2 _ r vectors, v r +\, v M , such that the 
extended set {vi,v r , v r +\, v M } is a basis for V. To complete the proof, we will show that the ^ — r vectors 
in the set S = {T(v r ^ \ ),.. T{\ n )) form a basis for the range of T. It will then follow that 

dim (R(t)) + dim(ker(0) = (n — r) +r = n 


First we show that S spans the range of T. If b is any vector in the range of T, then b = T(y) for some vector v in 
V. Since (vi, v r , v r _|_i,v M } is a basis for V 9 the vector v can be written in the form 

v = civi+ • ■ ■ +c r Y r + c r+ iv r+ i+ - ■ ■ 

Since v\,v r lie in the kernel of T 9 we have 7X v l) = ’ " * =T(v r )=0,so 


b = 7(v) =c r+ i7(v r+ i) + • • • +c„T(v„) 

Thus S spans the range of T. 

Finally, we show that S is a linearly independent set and consequently forms a basis for the range of T. Suppose that 
some linear combination of the vectors in S is zero; that is, 

^+l7(v, + i)+ • • • +£ m 7(v m ) =0 (8) 


We must show that k r +\ = - • ■ = k n = 0. Since T is linear, 8 can be rewritten as 

T(k r+i v r+ i+ • • • + = 0 

which says that k r . + • • • 4- k n \ n is in the kernel of T. This vector can therefore be written as a linear 
combination of the basis vectors {vj, v r ) , say 

*>+lV,+l+ • • • +*«v„ = *iv 1 + • • • +*>v, 


Thus, 


* 1 V 1 + • • * + k r v r - k r+ iv r+ i - • • • -k„v„ = 0 


Since {vj,v M } is linearly independent, all of the k's are zero; in particular, Ar r -j_i = 
completes the proof 


= k n = 0, which 


Concept Review 

Linear transformation 
Linear operator 
Zero transformation 
Identity operator 
Contraction 
Dilation 

Evaluation transformation 

Kernel 

Range 

Rank 

Nullity 

Skills 

Determine whether a function is a linear transformation. 

Find a formula for a linear transformation J* y —► W given the values of T on a basis for V. 
Find a basis for the kernel of a linear transformation. 

Find a basis for the range of a linear transformation. 

Find the rank of a linear transformation. 

Find the nullity of a linear transformation. 




Exercise Set 8.1 


In Exercises 1-8, determine whether the function is a linear transformation. Justify your answer. 
1 -TV & where V is an inner product space, and T(u) = ||u||. 

Answer: 

Nonlinear 

2. T:R^ —► R?> where vq is a fixed vector in R? and T(u) = ux vq- 

3. T : M 22 —* M 23 , where B is a fixed 2x3 matrix and T{A) = AB. 

Answer: 

Linear 


4. T : M nn —► R , where T(A) = tr(A). 
5 -F:M mn ^M nm , where F f^) = A T . 


Answer: 


Linear 

6. T: M 22 —<► R, where 



7. T.P 2 —>1*2’ where 



Answer: 


(a) Linear 

(b) Nonlinear 


8. T: F ( — 00 , 00 ) —► F ( — 00 , 00 ), where 

(a) T(f(x)) = \+f(x) 

(b) T(f(x))=f(x + 1) 


9. Consider the basis S = {vj, V 2 } for r} 9 where v\ = (1, 1) and V 2 = (1, 0), and let T.R? —► R? be the linear 
operator for which 


7(vi) = (l, -2) and T(v 2 ) = (-4, 1) 


Find a formula for T(x\, X 2 ), and use that formula to find T( 5, — 3). 






Answer: 


7(xi.x 2 ) = (-4xi+5x2. — 3^2); 7(5, -3) = (-35,14) 

10. Consider the basis S' = (vi, V 2 } for r}, where vi = ( — 2, 1) and V 2 = (1, 3), and let X.R 1 — ► R~' be the 
linear transformation such that 

7*(vi) = ( - 1. 2 , 0 ) and 7(v 2 ) = (0, - 3, 5) 

Find a formula for T(x\, * 2 ), and use that formula to find 7(2, — 3). 

11. Consider the basis S' = {vi, v 2 , V 3 } for R 2 ', where vi = (1, 1, 1), v 2 = (1, 1,0), and V 3 = (1, 0, 0), and let 
T.R? —*R? be the linear operator for which 

T(y 1 ) = (2, -1,4), 7(v 2 ) = (3,0,1), 

7(v 3 ) = (-1,5,1) 

Find a formula for T(x\, x 2 , x 3 ), and use that formula to find 7(2, 4, — 1). 

Answer: 

T(x\,x 2 , x 3 ) = (-xi +4^2-x 3 , -5 x2-x 3 , xi+3x 3 ); 7(2,4, - 1) = (15, -9, -1) 

12. Consider the basis S = (vi, V 2 , v 3 ) for r}, where vi = (1, 2, 1), v 2 = (2, 9, 0), and v 3 = (3, 3, 4), and let 

T R 1 ' ■ R 2 be the linear transformation for which 

7(vi) = (l,0), 7 (v 2 ) = ( — 1, 1), 7(v 3 ) = (0, 1) 

Find a formula for T(x\, x 2 , x 3 ), and use that formula to find 7(7, 13, 7). 

13. Let vj, V 2 , and v 3 be vectors in a vector space V, and let X. V —» R? be a linear transformation for which 

7(vi) = (1, -1,2), 7(v 2 ) = (0, 3, 2), 

r(Y 3 ) = (-3,1,2) 

Find 7(2vj — 3 v 2 4- 4v 3 ). 

Answer: 

7(2vi - 3 v2 + 4v 3 ) = ( - 10, -1,6) 

14. Let R R^ R 2 be the linear operator given by the formula 

T(x,y) = (2x-y, - 8 x + 4.y) 

Which of the following vectors are in /?(f)? 

(a) ( 1 . -4) 

(b) (5, 0 ) 

(c) (-3,12) 

15. Let X R 2 . R 2 be the linear operator in Exercise 14. Which of the following vectors are in ker(f)? 

(a) ( 5 , 10 ) 

(b) (3, 2 ) 

(c) (1. 1) 

Answer: 


(a) 



16. Let T'.R 4 > R* be the linear transformation given by the formula 

T(x i,* 2 . *3>*4) = (4xi + *2 “ 2*3 - 3*4, 

2 xi +X 2 +X 3 —4x4, 6 xi — 9x3 + 9x4) 

Which of the following are in R(t)7 

(a) ( 0 . 0 , 6 ) 

(b) ( 1 . 3, 0 ) 

(c) (2.4,1) 

17. Let T:R7 — ► F? b e the linear transformation in Exercise 16. Which of the following are in lcer(t) ? 

(a) (3, -8,2,0) 

(b) ( 0 , 0 , 0 , 1 ) 

(c) (0, -4,1,0) 

Answer: 

(a) 

18. Let T. P 2 —•* P 2 be the linear transformation defined by7’(/>(x)) = xp(x). Which of the following are in 
ker(0? 

(a) x 2 

(b) 0 

(c) 1 I x 

19. Let T. P 2 ► P 2 be the linear transformation in Exercise 18. Which of the following are in R(t)7 

(a) x + x 2 

(b) 1+* 

(c) 3 — x 2 

Answer: 

(a) 

20. Find a basis for the kernel of 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

21. Find a basis for the range of 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

Answer: 


(a) (1, -4) 

(b) (4,2,6), (1,1,0), (-3, -4,9) 



(c) X, X 2 , X 3 

22. Verify Formula 7 of the dimension theorem for 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

In Exercises 23-26, let T be multiplication by the matrix ,4. Find 

(a) a basis for the range of T. 

(b) a basis for the kernel of T. 

(c) the rank and nullity of T. 

(d) the rank and nullity of A. 


23. 1-1 3 

A= 5 6-4 

7 4 2 

Answer: 


(a) 

(b) 



'- 1 " 

7 

6 


4 


-14 

19 

11 


(c) Rank(7’) = 2, nullity(7’) = 1 

(d) Rank(j4) = 2, nullity(j4) = 1 


24. 


A = 


25. 


2 0 -1 
4 0-2 
20 0 0 

2 
0 



L 

'4 

1 

5 

A = 

_1 

2 

3 

Answer: 


(a) 

T 


'O' 


0 

7 

1 


(b) 


-1 

-1 

1 

0 


-4 

2 

0 

7 


(c) Rank (T) = nuUity(T’) = 2 


(d) Rank (A) = nullity (A) = 2 



26. [ 1 4 5 0 9" 

3 —2 10-1 
-1 0-10 -1 
2 3 5 1 8 

27. Describe the kernel and range of 

(a) the orthogonal projection on the x^-plane. 

(b) the orthogonal projection on the yz-plane. 

(c) the orthogonal projection on the plane defined by the equation y = x. 

Answer: 

(a) Kernel: y-axis; range: xz-plane 

(b) Kernel: v-axis; range: yz-plane 

(c) Kernel: the line through the origin perpendicular to the plane y = x; range: plane y = x 

28. Let Vb e any vector space, and let 7’; V —► V be defined by T(v) = 3v. 

(a) What is the kernel of 77 

(b) What is the range of 77 

29. In each part, use the given information to find the nullity of the linear transformation T. 

(a) T —► R y has rank 3. 

(b) T: P 4 —> P 3 has rank 1 . 

(c) The range of X.R * > R* is R*. 

(d) T: M 22 —+ M 22 has rank 3. 

Answer: 

(a) Nullity (T) = 2 

(b) Nullity (T) =4 

(c) Nullity (T) = 3 

(d) Nullity (T) = 1 

30. Let A be a 7 x 6 matrix such that Ax = 0 has only the trivial solution, and let T.R^ > R y be multiplication by 
A. Find the rank and nullity of T. 

31. Let A be a 5 x 7 matrix with rank 4. 

(a) What is the dimension of the solution space of Ax = 0? 

(b) Is,4x = b consistent for all vectors b in R- fC ! Explain. 

Answer: 

(a) 3 

(b) No 

32. Let T.R? —► W be a linear transformation from R-' to any vector space. Give a geometric description of ker(^)* 



33. Let T V * R~' be a linear transformation from any vector space to R-'. Give a geometric description of R(t ). 

Answer: 

A line through the origin, a plane through the origin, the origin only, or all of R- 

34. Let T.R? —► R? be multiplication by 

"13 4" 

3 4 7 
“2 2 0 

(a) Show that the kernel of T is a line through the origin, and find parametric equations for it. 

(b) Show that the range of T is a plane through the origin, and find an equation for it. 

35- (a) Show that if a 1 , a 2 , b \ , and &2 are an Y scalars, then the formula 

F(x,y) = (a\x +b\y,a 2 X +bzy) 

defines a linear operator on R A 

(b) Does the formula F y J = \ct\x* + b\y^, a 2* 2 + b^y A j define a linear operator on r}1 Explain. 

Answer: 

(b) No 

36. Let { vi , V 2 ,.. v M } be a basis for a vector space V, and let T: V » W be a linear transformation. Show that if 

7(vi) = r(v 2 )= ■ ■ ■ =T(v m )=0 

then T is the zero transformation. 

37. Let (vi, V 2 ,v„} be a basis for a vector space V, and let T:V » V be a linear operator. Show that if 

7(vi)=vi, T(v 2 )=v 2 . T(v„)=v„ 

then T is the identity transformation on V. 

38. For a positive integer ^ > ], let T. M nn »R be the linear transformation defined by T (-4) = tr(^4), where A is 
an n x n matrix with real entries. Determine the dimension of ker(^). 

39. Prove: If { v \, V 2 ,..v„ } is a basis for V and w \, W 2 ,..are vectors in W, not necessarily distinct, then 
there exists a linear transformation T: V —► W such that 

7(vi) =wi, 7 (v 2 ) = w 2 , .... T(y„) = w„ 

40. ( Calculus required) Let V = C[a, b] be the vector space of functions continuous on [a 7 b] , and let T \V - ?V 
be the transformation defined by 

7(f) = 5/ (x) +3 f* f (t)dt 

J a 

Is T a linear operator? 

41. ( Calculus required) Let D.P^ ► P2 be the differentiation transformation D (p ) = p* (*). What is the kernel of 
D1 

Answer: 


ker(D) consists of all constant polynomials. 



42. 


(Calculus required) Let J P\ 


of J? 


£ be the integration transformation J(p) 



p(x)dx • What is the kernel 


43. ( Calculus required) Let Lbe the vector space of real-valued functions with continuous derivatives of all orders 
on the interval ( — oo, 00 ), and let W = F ( — 00 , 00 ) be the vector space of real-valued functions defined on 
(-00,00). 


(a) Find a linear transformation TY->W whose kernel is P 3 . 

(b) Find a linear transformation TY->W whose kernel is P n . 


Answer: 

(a) T(f(x))=/^(x) 

(b) T(f(x))=f ( -” +l \x) 

44. If A is an m x n matrix, and if the linear system = b is consistent for every vector b in R m , what can you 

say about the range of TR” —» R mc ! 

True-False Exercises 


In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) If T(c\v\ 4- ^2 V 2) = ciT(vi) I C2T(v2) for all vectors v\ and V 2 in V and all scalars Cl and C2, then Tis a 
linear transformation. 

Answer: 

True 

(b) If v is a nonzero vector in V, then there is exactly one linear transformation T’V —*W su °h that 
7(-v)= -T(v). 

Answer: 

False 

(c) There is exactly one linear transformation — ► W f° r which f(u + v) = T(u — v) for all vectors u and v in 

V. 

Answer: 

True 

(d) If vq is a nonzero vector in V, then the formula T(v) = vq + v defines a linear operator on V. 

Answer: 

False 

(e) The kernel of a linear transformation is a vector space. 

Answer: 


True 


(f) The range of a linear transformation is a vector space. 

Answer: 

True 

(g) If T. Pfi —► M 22 is a linear transformation, then the nullity of T is 3. 

Answer: 

False 

(h) The function T : M 22 R defined by T(A) = det A is a linear transformation. 

Answer: 

False 

(i) The linear transformation T : M 22 — 5 ► M 22 defined by 

T{A) 



has rank 1. 
Answer: 

False 
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8.2 Isomorphism 

In this section we will establish a fundamental connection between real finite-dimensional vector spaces and the Euclidean 
space R* 1 . This connection is not only important theoretically, but it has practical applications in that it allows us to perform 
vector computations in general vector spaces by working with the vectors in R n . 


One-to-One and Onto 

Although many of the theorems in this text have been concerned exclusively with the vector space R n , this is not as limiting 
as it might seem. As we will show, the vector space R n is the “mother” of all real ^-dimensional vector spaces in the sense 
that any such space might differ from R n in the notation used to represent vectors, but not in its algebraic structure. To 
explain what we mean by this, we will need two definitions, the first of which is a generalization of Definition 1 in Section 
4.10. (See Figure 8.2.1). 

r n 


DEFINITION 1 

If X: V —► W is a linear transformation from a vector space V to a vector space W, then T is said to be one-to-one if 
T maps distinct vectors in V into distinct vectors in W. 


J 

n 


DEFINITION 2 

If 7* y _► is a linear transformation from a vector space V to a vector space W, then T is said to be onto (or onto 
W) if every vector in W is the image of at least one vector in V. 


J 


V 


w V 


w v 


w 


V 


w 



rx 

Range 
of T 


- 


Range 
of T 


One-to-one. Distinct 
vectors in V have 
distinct images in W. 


Not one-to-one. There 
exist distinct vectors in 
Vwith the same image. 


Onto IV. Every vector in 
W is the image of some 
vector in V. 


Not onto W. Not every 
vector in IV is the image 
of some vector i n V. 


Figure 8.2.1 


The following theorem provides a useful way of telling whether a linear transformation is one-to-one by examining its 
kernel. 


THEOREM 8.2.1 


If 7* y pF is a linear transformation, then the following statements are equivalent. 



















(a) T is one-to-one. 

(b) ker(0 = { 0 } 


Proof (a) => (b) Since T is linear, we know that 7X0) = 0 by Theorem 8.1.1a. Since T is one-to-one, there can be no 
other vectors in V that map into 0, so ker(T) = {0} . 

Assume that ker(^) = (0) . If u and v are distinct vectors in V, then u — v ^ 0- This implies that T(n — v) * 0, 
for otherwise ker(T) would contain a nonzero vector. Since T is linear, it follows that 

7Xu)-7Xv) = 7Xu-v)*0 

so T maps distinct vectors in V into distinct vectors in W and hence is one-to-one. 


In the special case where V is finite-dimensional and T is a linear operator on V, then we can add a third statement to those 
in Theorem 8.2.1. 


THEOREM 8.2.2 

If V is a finite-dimensional vector space, and if T 7 ; y y is a linear operator, then the following statements are 
equivalent. 

(a) T is one-to-one. 

(b) ker(0 = { 0 } . 

(c) T is onto [i.e., £(2) = V] 


We already know that (a) and (b) are equivalent by Theorem 8.2.1, so it suffices to show that (b) and (c) are 
equivalent. We leave it for you to do this by assuming that dimf^) = n and applying Theorem 8.1.4. 


EXAMPLE 1 Dilations and Contractions Are One-to-One and Onto 

Show that if V is a finite-dimensional vector space and c is any nonzero scalar, then the linear operator 
T V -+V defined by 7 7 (v) = cv is one-to-one and onto. 

The operator T is onto (and hence one-to-one) for if v is any vector in V then that vector is the 
image of the vector (1 / c)v. 


EXAMPLE 2 Matrix Operators 

If Tjj;.R n —» R n is the matrix operator Tji(x) = ylx, then it follows from parts (r) and (s) of Theorem 5.1.6 that 
Tj\ is one-to-one and onto if and only if A is invertible. 


EXAMPLE 3 Shifting Operators 


Let V = R ^ be the sequence space discussed in Example 3 of Section 4. 1 , and consider the linear “shifting 
operators” on V defined by 

^l(«l .«2 .«».—) = 

^2(“1.“2 .«m.— ) = (M2> u 3 .«»,— ) 

Show that Ti is one-to-one but not onto. 

(b) Show that 7*2 is onto but not one to one. 

Solution 

The operator T\ is one-to-one because distinct sequences in R x obviously have distinct images. This 
operator is not onto because no vector in £°° maps into the sequence ( 1 , 0 , 0 , 0 ,...), for example. 

The operator is not one-to-one because, for example, the vectors (1, 0, 0,..., 0,...) and 
(2, 0, 0,0,...) both map into (0, 0, 0,0,...). This operator is onto because every possible 
sequence of real numbers can be obtained with an appropriate choice of the numbers w 2 , 2 * 3 , 


Why does Example 3 not violate Theorem 8.2.2? 


EXAMPLE 4 Basic Transformations That Are One-to-One and Onto 


The linear transformations TyP^—^R^ an d Ti Mji —► i? 4 defined by 

T\(a + bx + cx 2 + = (a, b,c,d J 


T 2 


a b 
c d 


= \a,b,c,d\ 


are both one-to-one and onto (verify by showing that their kernels contain only the zero vector). 


EXAMPLE 5 A One-to-One Linear Transformation 

Let T. P n —► P n +\ be the linear transformation 

T(p ) = T(p(x)) = xp(x ) 

discussed in Example 5 of Section 8.1. If 

p = p(x) =cq 4-ci* 4 s " • • + and q = q(x) = d$ + d\x + • • • -\-dypP 1 

are distinct polynomials, then they differ in at least one coefficient. Thus, 

7^p J =cqx ^c\x 2 + • • • + and T|qJ =d$x -\-d\x 2 + • • • ^d n x n + X 

also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct polynomials p and q 
into distinct polynomials T(p) and T(q). 

CALCULUS REQUIRED 

EXAMPLE 6 A Transformation That Is Not One-to-One 


Let 


D: C 1 ^ — oo, ooj —►7’^ — oo, oo 




be the differentiation transformation discussed in Example 11 of Section 8.1. This linear transformation is not 
one-to-one because it maps functions that differ by a constant into the same function. For example, 


Z)(x 2 ) = i)(x 2 -M) = 2x 


Dimension and Linear Transformations 

In the exercises we will ask you to prove the following two important facts about a linear transformation Ty —► W in the 
case where V and W are finite-dimensional: 

If dim(FF) < <3om(V), then T cannot be one-to-one. 

If dim^) < dim(PT), then T cannot be onto. 

Stated informally, if a linear transformation maps a “bigger” space to a “smaller” space, then some points in the “bigger” 
space must have the same image; and if a linear transformation maps a “smaller” space to a “bigger” space, then there must 
be points in the “bigger” space that are not images of any points in the “smaller” space. 

These observations tell us, for example, that any linear transformation from p} to must map some distinct 
points of p} into the same point in £ 2 , and it also tells us that there is no linear transformation that maps p 2 onto all of p^. 


Isomorphism 

Our next definition paves the way for the main result in this section. 


DEFINITION 3 

If a linear transformation T\ y _> is both one-to-one and onto, then T is said to be an isomorphism , and the 
vector spaces V and W are said to be isomorphic. 


J 


The word isomorphic is derived from the Greek words iso , meaning “identical,” and morphe, meaning “form.” This 
terminology is appropriate because, as we will now explain, isomorphic vector spaces have the same “algebraic form,” even 
though they may consist of different kinds of objects. To illustrate this idea, examine Table 1 in which we have shown how 
the isomorphism 

ao (aQ,a\,a2^ 

matches up vector operations in P 2 and p-'. 


Table 1 


Operation in Pi 

Operation in R 3 

3(l —2x + 3x 2 ) = 3 —6jr + 9x 2 

3(1, -2,3) = (3, -6,9) 

^2 + x-x 2 J+ (l - x + 5x 2 J = 3 + 4x 2 

(2, 1, -1) + (1, -1,5) = (3,0,4) 






Operation in Pi 

Operation in R 3 

i 

(4 + 2x + 3x 2 ] 

1-' 

[2-4x + 3;r 2 ] 

| = 2 + 6 x 

(4, 2, 3) -(2, -4, 3) = (2,6,0) 


The following theorem, which is one of the most important results in linear algebra, reveals the fundamental importance of 
the vector space 

THEOREM 8.2.3 

Every real ^-dimensional vector space is isomorphic to R* 1 . 


Theorem 8.2.3 tells us that a real ^-dimensional vector 
space may differ from R™ in notation, but its algebraic 
structure will be the same. 


Let Ebe a real ^-dimensional vector space. To prove that E is isomorphic to R” we must find a linear 
transformation T: V —► R n that is one-to-one and onto. For this purpose, let 


be any basis for E, let 


vi, v 2 ,.~, v„ 


u = jfcivi +£ 2 V2+ • • ■ -\-k n Y n (1) 

be the representation of a vector u in E as a linear combination of the basis vectors, and define the transformation 
T.V^R” by 


T(u) = (k h k 2 ,..,k n ) ( 2 ) 

We will show that T is an isomorphism (linear, one-to-one, and onto). To prove the linearity, let u and v be vectors in E, let 
c be a scalar, and let 


\i = k\vi +^2 v 2 + ’ ’ “ d ~k n \ n and v = d\vi 4- ^2 V 2 + * * ' 4- d n v n 


(3) 


be the representations of u and v as linear combinations of the basis vectors. Then it follows from 1 that 

7(cu) = T(ck\v\ +c^2V2 + ' ■ * 

= (ck\,ck 2 ,...,ck n ) 

= c(k\, k 2 ,k n ) =cT( u) 


and it follows from 2 that 

7(u + v) 


T((ki +<afi)vi + (k 2 +d 2 )v 2 + • • • + (k„ + d n )v n ) 
(£l +di,k2 + d2, —.kn+dn) 

(k\,k2 . k„) + (di,d2 . d„) 

T(n) 4 - T(y) 


which shows that T is linear. To show that T is one-to-one, we must show that if u and v are distinct vectors in E, then so are 
their images in But if u ^ v> and if the representations of these vectors in terms of the basis vectors are as in 3, then we 







must have * d 2 for at least one i. Thus, 

T(n) = (k h k 2 . k»)*(d h d 2 . d») = 7(v) 

which shows that u and v have distinct images under T. Finally, the transformation T is onto, for if 

w= {k\,k 2 , k n ) 

is any vector in £”, then it follows from 2 that w is the image under T of the vector 

u = *l v l 4“^2 v 2 + ' ‘ 

Note that the isomorphism T in Formula 2 of the foregoing proof is the coordinate map 

uZ (k\,k 2 . k n ) = (u) 5 

that maps u into its coordinate vector with respect to the basis S = {vj, V 2 ,v w } . Since there are generally many 
possible bases for a given vector space V, there are generally many possible isomorphisms between V and R ”, one for each 
different basis. 


EXAMPLE 7 The Natural Isomorphism from P n - i to R n 

We leave it for you to verify that the mapping 

X ^o, a\,a n -\ J 

from P n -\ to R ” is one-to-one, onto, and linear. This is called the natural isomorphism from P n -\ to R n 
because, as the following computations show, it maps the natural basis |l, 1 j for P n -\ into the 

standard basis for R n : 

1 = 1 + Ox + Ox 2 + • • • + Ox” -1 X (1, 0, 0,.... 0) 
x = 0 + x + 0x 2 + • • • + 0x" _1 X (0,1,0.0) 

x”" 1 = 0 * Ox * Ox 2 + • • • + x" _1 X (0,0,0,...,!) 


EXAMPLE 8 


The Natural Isomorphism from M 22 to R 4 


◄ 


The matrices 


1- 1 

0 — 1 

0 0 

1 _ 1 

ba 

to 

II 

"0 f 

.0 0 . 

. 23 = 

0 0 1 

0 ( 

II 

1-1 

0 1— 

0 0 


B 1 = 

u u j L' JU J L AU J l u 

form a basis for the vector space M 22 °f 2 x 2 matrices. An isomorphism j M 22 —*► R^ can be constructed by 
first writing a matrix A in M 22 An terms of the basis vectors as 


and then defining T as 
Thus, for example, 


r«i 



"i c 


"0 r 


"0 

o' 


"0 o' 

[a 3 

Ct/\ 

= a\ 

.0 c 

,J+-2 

.0 °. 

+ <23 

1 

0_ 

+ «4 

.0 1_ 


7(A) = (ai,a 2 ,a 3 ,a 4 ) 


1 -3 
4 6 


1, -3,4,6 


More generally, this idea can be used to show that the vector space M mn of m x n matrices with real entries is 
isomorphic to R myi . 




















EXAMPLE 9 Differentiation by Matrix Multiplication 


Consider the differentiation transformation D.P^ —► P 2 on the vector space of polynomials of degree three or 
less. If we map P 3 and P 2 into g^ and g}, respectively, by the natural isomorphisms, then the transformation D 
produces a corresponding matrix transformation from g^ to g-'. Specifically, the derivative transformation 


^0 + a \x + & 2 X + < 23 * ^ a 1 + 2ct2X 4 - 3a^x 2 


produces the matrix transformation 


0 1 
0 0 
0 0 


« 0 
a\ 

a2 
a 2 


Thus, for example, the derivative 

can be calculated as the matrix product 


2 + x+4x' 


: -' 3 )= 


1 

2a2 
3a 3 


1 + 8 x - 3x 2 


"0100" 
0 0 2 0 

2' 

1 

A 


f 

8 

0 0 0 3 

-1 


-3 


This idea is useful for constructing numerical algorithms to perform derivative calculations. 


Inner Product Space Isomorphisms 

In the case where V is a real //-dimensional inner product space, both V and R n have, in addition to their algebraic structure, 
a geometric structure arising from their respective inner products. Thus, it is reasonable to inquire if there exists an 
isomorphism from V to g n that preserves the geometric structure as well as the algebraic structure. For example, we would 
want orthogonal vectors in V to have orthogonal counterparts in g™, and we would want orthonormal sets in V to 
correspond to orthonormal sets in 

In order for an isomorphism to preserve geometric structure, it obviously has to preserve inner products, since notions of 
length, angle, and orthogonality are all based on the inner product. Thus, if V and W are inner product spaces, then we call 
an isomorphism g y _» W an inner product space isomorphism if 

(7(u),T(v)} = (u,v} 

It can be proved that if V is any real ^/-dimensional inner product space and g n has the Euclidean inner product (the dot 
product), then there exists an inner product space isomorphism from Vto g n . Under such an isomorphism, the inner 
product space Ehas the same algebraic and geometric structure as g n . In this sense, every //-dimensional inner product 
space is a “carbon copy” of/?” with the Euclidean inner product that differs only in the notation used to represent vectors. 


EXAMPLE 10 An Inner Product Space Isomorphism 














Let R n be the vector space of real ^-tuples in comma-delimited form, let M n be the vector space of real n x 1 
matrices, let R n have the Euclidean inner product (u, vj = u • v, and let M n have the inner product 

ju, vj = u^v in which u and v are expressed in column form. The mapping T:R” —» M n defined by 

(vi. V2,..., v M ) X 

is an inner product space isomorphism, so the distinction between the inner product space R n and the inner 
product space M n is essentially notational, a fact that we have used many times in this text. 


vi 

v 2 

v M 


Concept Review 

• One-to-one 
Onto 

Isomorphism 
Isomorphic vector spaces 
Natural isomorphism 
Inner product space isomorphism 

Skills 

Determine whether a linear transformation is one-to-one. 
Determine whether a linear transformation is onto. 

Determine whether a linear transformation is an isomorphism. 


Exercise Set 8.2 

1. In each part, find ker(/), and determine whether the linear transformation T is one-to-one. 

(a) T R 2 -» R 2 , where T(x, y) = (y, x ) 

(b) T:R 2 — R 2 , where T(x, y) = (0, 2x + 3 y) 

(c) T:R 2 ->R 2 , where T{x,y) = (x+y,x-y) 

(d) T.R 2 -*R?, where T(x, y ) = (x, y, x + y) 

(e) T\R 2 — R?, where T{x, y) = (x -y, y - x, 2x - 2 y) 

(f) T:R 2 -*R 2 , where T(x,y,z) = (x + y + z, x-y -z) 

Answer: 

(a) ker(T) = {0}; T is one-to-one 

(k) ker(T) = — lH; T is not one-to-one 


(c) ker(T) = {0}; T is one-to-one 




(d) ker(T) = {0}; T is one-to-one 

(e) ker(T) = {£(1, 1)} ; T is not one-to-one 

(f) ker(7^) = {£(0, 1, — 1)} ; T is not one-to-one 


2. Which of the transformations in Exercise 1 are onto? 

3. In each part, determine whether multiplication by A is a one-to-one linear transformation. 


(a) 


A = 


(b) 


A = 


(c) 


A = 


Answer: 


1 -2 
2 -4 


-3 

1 
2 

-1 

4 -2 
1 5 

5 3 


6 

3 

-1 

3 


(a) Not one-to-one 

(b) Not one-to-one 

(c) One-to-one 

4. Which of the transformations in Exercise 3 are onto? 

5. As indicated in the accompanying figure, let TP? —* P? be the orthogonal projection on the line y —x. 

(a) Find the kernel of T. 

(b) Is T one-to-one? Justify your conclusion. 

y y = x 

x 


\ 


T(\) 


Figure Ex-5 


Answer: 

(a) ker(T) = {fc( — 1,1)} 

(b) T is not one-to-one since ker(T) * {0} . 

6. As indicated in the accompanying figure, let T■ p} g} be the linear operator that reflects each point about they- 

(a) Find the kernel of T. 

(b) Is T one-to-one? Justify your conclusion. 


■axis. 








X 


n*) - 


Figure Ex-6 

7. In each part, use the given information to determine whether the linear transformation T is one-to-one. 

(a) T:R m ^R m - nullity(0 = 0 

(b) T\R n — rank(0=»-l 

(c) T:R m ^R n : n<m 

(d) T.R™ —>R n \ R{t)=R n 


Answer: 

(a) T is one-to-one 

(b) T is not one-to-one 

(c) T is not one-to-one 

(d) T is one-to-one 

8. In each part, determine whether the linear transformation T is one-to-one. 

( a ) T:P 2 —*^ 3 , where Tfa$ + a 1 * + = + a\* +tf 2 * 2 ) 

(b) T:P 2 ->P 2 , where 7 (/>(x)) = p(x + 1) 

9. Prove: If V and W are finite-dimensional vector spaces such that dim(PF) < dirnfF’), then there is no one-to-one linear 
transformation g- V —► W- 

10. Prove: There can be an onto linear transformation from V to W only if dim^) > dim(PF). 

(a) Find an isomorphism between the vector space of all 3 x 3 symmetric matrices and g^. 

(b) Find two different isomorphisms between the vector space of all 2 x 2 matrices and g^. 

(c) Find an isomorphism between the vector space of all polynomials of degree at most 3 such that £>(0) = 0 and g}. 

(d) Find an isomorphism between the vector spaces span {1, sin (x), cos(x) } and g-'. 


Answer: 


(a) 


\L 


b c 
d e 

e f 



(b) 





(c) 


T(ax 3 + bx 2 + cx) = 


a 

b 


c 


a 

c 


b 

d 




















T{a + & sin(x) +c cos(x)) = 


(d) 


12 . 






i=/: 


p(x)dx. Determine whether J is 


(Calculus required) Let J P\ .* R be the integration transformation J 
one-to-one. Justify your conclusion. 

( Calculus required) Let Lbe the vector space C ^0, 1J and let 7:17 Rbe defined by 

7(f) = /(0) + 2/'(0) + 3/'(l) 

Verify that Lisa linear transformation. Determine whether T is one-to-one, and justify your conclusion. 

Answer: 

9 9 

T is not one-to-one since, for example, / (x) = x (x — 1) is in its kernel. 

14. (Calculus required) Devise a method for using matrix multiplication to differentiate functions in the vector space 
span (1, sin(x), cos(x), sin(2x), cos(2x) } . Use your method to find the derivative of 
3—4 sin(x) + sin(2x) + 5 cos(2x). 

Does the formula T\a, b, c^=ax~ I bx I- c define a one-to-one linear transformation from R to P 2 ? Explain your 
reasoning. 

Answer: 


Yes; it is one-to-one 

16. Let E be a fixed 2x2 elementary matrix. Does the formula 7(A) = EA define a one-to-one linear operator on Mjfi 
Explain your reasoning. 

17. Let a be a fixed vector in r}. Does the formula T(y) = a x v define a one-to-one linear operator on £ 3 ? Explain your 
reasoning. 

Answer: 


T is not one-to-one since, for example a is in its kernel. 

18. Prove that an inner product space isomorphism preserves angles and distances—that is, the angle between u and v in V 

is equal to the angle between T(u) and 7Xv) in W, and ||u — v|| = \\T(n) —T(v) ||^. 

19. Does an inner product space isomorphism map orthonormal sets to orthonormal sets? Justify your answer. 

Answer: 


Yes 

20. Find an inner product space isomorphism between P$ and Mjs- 

True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) The vector spaces R 2 and P 2 are isomorphic. 

Answer: 

False 

(b) If the kernel of a linear transformation T.P^ —► P 3 is {0} , then T is an isomorphism. 





Answer: 


True 

(c) Every linear transformation from A /33 to P 9 is an isomorphism. 

Answer: 

False 

(d) There is a subspace of M 22 that is isomorphic to 
Answer: 

True 

(e) There is a 2 x 2 matrix P such that T: M 22 —► Mji defined by T(A) = AP — PA is an isomorphism. 
Answer: 

False 

(f) There is a linear transformation T: P 4 — ► P 4 such that the kernel of T is isomorphic to the range of T. 
Answer: 

False 
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8.3 Compositions and Inverse Transformations 

In Section 4.10 we discussed compositions and inverses of matrix transformations. In this section we will 
extend some of those ideas to general linear transformations. 


Composition of Linear Transformations 

The following definition extends Formula 1 of Section 4.10 to general linear transformations. 

Note that the word “with” establishes the order 
of the operations in a composition. The 
composition of 7*2 with T\ is 

(r 2 o7 1 )(u) = T 2 (Ti(u)) 
whereas the composition of T\ with 7*2 is 
(Tio7 2 )(u) = 7 1 (7 2 (u)) 


i 


DEFINITION 1 

If T\ : U —► V and TjV —» W are linear transformations, then the composition of T 2 with T \, 
denoted by 7 2 o T\ (which is read “7 2 circle T\”), is the function defined by the formula 

(7 2 o7i)(u) = 7 2 (7 1 (u)) (1) 


where u is a vector in U. 


J 


Observe that this definition requires that the domain of 7*2 (which is V) contain the range of T \. 
This is essential for the formula T 2 {T\ (u)) to make sense (Figure 8.3.1). 

7 *2 o l \ 


U 


ii 


T’i 


V 


T,( ii) 


7\ 


W 


T 2 (T { (u)) 


The composition of T 2 with T \. 


Our first theorem shows that the composition of two linear transformations is itself a linear transformation. 


THEOREM 8.3.1 


If T \: U —► V and T 2 .V —* W are linear transformations, then (T 2 oT\) U —*W is also a linear 
transformation. 


If u and v are vectors in U and c is a scalar, then it follows from 1 and the linearity of 7 \ and 7 2 that 


(7 2 oTi)(u + v) = 7 2 (7i (u + v) ) = T 2 (Xi (u) + T { (v)) 
= 7 2 (7i(u)) + 7 2 (7i(y)) 

= (7 2 o7i)(u) + (7 2 o7i)(v) 


and 

(7 2 o 7i)(cu) = 7 2 (7i(cu)) = 7 2 (c7i(u)) 

= c7 2 (7i(u))=c(7 2 o7i)(u) 

Thus, 7 2 o 7j satisfies the two requirements of a linear transformation. 


EXAMPLE 1 Composition of Linear Transformations 

Let T\.P\ —► P 2 and T 2 : P 2 —► P 2 he the linear transformations given by the formulas 
T\(p(x))=xp(x) and T 2 (p(x)) = p(2x + 4) 

Then the composition (7*2 o T\) :P\ — ► P 2 is given by the formula 

(7 2 o 7i)(p(x)) = 7 2 (7j (*(*))) = T 2 (xp(x)) = (2* + 4)p(2x + 4) 
In particular, if p(x) =cq + c\x, then 

(7 2 o7i)(^(x)) = (7 2 o 7i)(co + ci^) = (2x + 4)(c 0 + ci(2x + 4)) 

= cq ( 2 x + 4j + ci(2x + 4)^ 


EXAMPLE 2 Composition with the Identity Operator 

If7:r- y is any linear operator, and if /; y _► y is the identity operator (Example 3 of Section 
8.1), then for all vectors v in V, we have 

(7o/)(v) = 7(/(v)) = 7(v) 

(/oT)(v) = /(7(v))=7(v) 

It follows that To I and / 0 7 are the same as T; that is, 


7o/ = 7 and loT=T 


(2) 


As illustrated in Figure 8.3.2, compositions can be defined for more than two linear transformations. For 
example, if 

T\:U->V, Ti'V —» W, and T 3 :W^Y 

are linear transformations, then the composition T 3 o T 2 o Ti is defined by 

(T 3 oT 2 o Ti)(u) = T 3 (T 2 (T 1 (}i))) (3) 


(r s o r 2 o r,Ku) 


II 

u 


r,(u) 

V 


7*2 


r 2 (r,(u» 

w 


t 3 

T 3 (T 2 (T x (n))) 

Y 


The composition of three linear transformations. 


Inverse Linear Transformations 

In Theorem 4.10.1 we showed that a matrix operator Tj[. R n » R n is one-to-one if and only if the matrix A is 
invertible, in which case the inverse operator is T ^_i . We then showed that if w is the image of a vector x 
under the operator T then x is the image under T ^_i of the vector w (see Figure 4.10.8). Our next objective 
is to extend the notion of invertibility to general linear transformations. 

Recall that if J 7 y _»is a linear transformation, then the range of T, denoted by R{t ), is the subspace of W 
consisting of all images under T of vectors in V. If T is one-to-one, then each vector w in R(t) is the image of 
a unique vector v in V. This uniqueness allows us to define a new function, called the inverse of T and 
denoted by 7 ” 1 , that maps w back into v (Figure 8.3.3). 

T 

v w = 7Xv) 

V T 1 R(T) 

The inverse of T maps T(v) back into v. 

It can be proved (Exercise 19) that T~ 1 : R{t) —► V is a linear transformation. Moreover, it follows from the 
definition of 7 _1 that 

7 " I ( 7 '( y )) = 7 "‘( W ) = V (4) 


7(T-‘(w)) = 7(v)=w 


(5) 


so that T and R *, when applied in succession in either order, cancel the effect of each other. 


It is important to note that if R- V —► W is a one-to-one linear transformation, then the domain of 
7’ -1 is the range of T, where the range may or may not be all of W. However, in the special case where 
T\ V — > V is a one-to-one linear operator and V is ^-dimensional, then it follows from Theorem 8.2.2 that T 
must also be onto, so the domain of 7’ -1 is all of V. 

EXAMPLE 3 An Inverse Transformation 

In Example 5 of Section 8.2 we showed that the linear transformation T:P n —» P n -\-\ given by 

T(p) = T(p(x)) = xp(x) 

is one-to-one; thus, T has an inverse. In this case the range of T is not all of P n ^\ but rather the 
subspace of P n -\.\ consisting of polynomials with a zero constant term. This is evident from the 
formula for T: 

7’^co + c T*+ ' ' ' +c yi x y> ''j = CQX+c\x 2 + • • • 

It follows that T~^ :R(t) —* P }! is given by the formula 

7 -1 -¥c\x 2 + • • • +Chx” + 1 j =co + - - +CnX n 

For example, in the case where n > 3, 

T ~* (2x — x^ + 5x^ + = 2 — x + 5x* + 3x^ 


EXAMPLE 4 An Inverse Transformation 

Let T.B? —* R~' be the linear operator defined by the formula 

T(xi,X2,xi) = ( 3 x\ +*2> - 2 xj -4x2+ 3x3, 5 xi +4x2-2x3) 
Determine whether T is one-to-one; if so, find T~ ^ |x \ , X 2 , X 3 j. 

It follows from Formula 12 of Section 4.9 that the standard matrix for T is 




3 

1 

O' 

T 

= 

-2 

-4 

3 



5 

4 

-2 


(verify). This matrix is invertible, and from Formula 7 of Section 4.10 the standard matrix for 
is 



= [7]-‘ = 

4 

-2 

-3' 

T -\ 

-11 

6 

9 



-12 

7 

10 


It follows that 











T*r 

\ 


T~ l 

~*T 


4 

-2 

—3 

■*r 


4 xi 

— 

2x2 

— 

3 x 3 

T -\ 

*2 


= 

*2 

= 

-11 

6 

9 

*2 

= 

—11*1 

+ 

6x2 

+ 

9*3 






x 3 


-12 

7 

10 

x 3 


—12*i 

+ 

7 x 2 

+ 

10*3 


Expressing this result in horizontal notation yields 

7 -1 X 2 , X3J = - 2^2 — 3*3, -11x1 + 6*2 + 9*3, -12x1+7*2 + 



Composition of One-To-One Linear Transformations 

The following theorem shows that a composition of one-to-one linear transformations is one-to-one, and it 
relates the inverse of a composition to the inverses of its individual linear transformations. 


THEOREM 8.3.2 

If Ti: U —* V and T2.V —*W are one-to-one linear transformations, then 

(a) T 2 oT\ is one-to-one. 

(b) (T 2 oTi)" 1 = Tf 1 oT 2 _1 . 


We want to show that T2 o T\ maps distinct vectors in U into distinct vectors in W. But if u and v 
are distinct vectors in U, then T\ (u) and T\ (v) are distinct vectors in V since T 1 is one-to-one. This and the 
fact that T2 is one-to-one imply that 


T 2 (Ti(u)) and T 2 {T x {w)) 

are also distinct vectors. But these expressions can also be written as 

(T 2 oTi)(u) and (T 2 oTi)(v) 
so T 2 o Ti maps u and v into distinct vectors in W. 

Proof (b) We want to show that 


(r 2 o ri r 1 (w) = (77 1 o 77 1 ) (w) 

for every vector w in the range of T 2 o T\. For this purpose, let 

u=(r 2 <.ri)-'(w) 


(«) 


so our goal is to show that 


u=(77 1 o7 2 -‘)(w) 
















But it follows from 6 that 


(r 2 o7i)(u) =w 


or, equivalently, 


7 2 (7i(u))=w 

Now, taking 77 1 of each side of this equation, then taking of each side of the result, and then using 4 

yields (verify) 

« = 7r‘(7 2 -'(w)) 

or, equivalently, 

u= (7P 1 o7 2 - 1 )(w) 


In words, part ( b ) of Theorem 8.3.2 states that the inverse of a composition is the composition of the inverses 
in the reverse order. This result can be extended to compositions of three or more linear transformations; for 
example, 

(7 3 o 7 2 o 7i) -1 = 7f 1 o Tp o Tf 1 (7) 

In the case where TT& and Tc are matrix operators on R } \ Formula 7 can be written as 

( 7(^0 Tqq Ta) =T a o T b oT c 

or alternatively as 

(Tcba)- 1 = T a .i b .i c .i (8) 


Note the order of the subscripts on the two 
sides of Formula 8. 


Concept Review 

Composition of linear transformations 
Inverse of a linear transformation 

Skills 

Find the domain and range of the composition of two linear transformations. 
Find the composition of two linear transformations. 

Determine whether a linear transformation has an inverse. 

Find the inverse of a linear transformation. 


Exercise Set 8.3 


1. Find (7 2 o 7i )(*,>>)• 

(a) T\(x,y) = (2.x, 3y), T 2 (x,y) = (x- 7 , x+/) 

(b) T\(x,y) = (x-3y, 0),T 2 (x,y) = (4x-5y, 3x-6y) 

(c) Ti(x,y) = (2x, - 3y, x +y), T 2 (x, y,z) = (x-y,y+z) 

(d) T\(x,y) = (x-y,y,x),T 2 (x,y,z) = (0, x+.y +z) 

Answer: 

(a) (7 2 o 7i)(x,.y) = (2x - 3y, 2x 4- 3y) 

(b) (7 2 o T0(x,y) = (4x - 12y, 3x - 9y) 

(c) (7 2 o7i )(*,>>) = (2x + 3.y, x — 2y) 

(d) (T 2 oT\)(x,y) = (0,2x) 

2. Find(7 3 o7 2 o7i)(x.>0- 

(a) T\(x,y) = (-2 y, 2x, x - 2y), T 2 (x,y,z) = (y,z,x),T 3 (x,y,z) = ( x+z,y-z) 

(b) Ti(x,y) = (x+y,y, - x), T 2 (x,y,z) = (0, x+y+z, 3y), 

T 3 (x, y,z) = (3x + 2y,4z-x- 3y) 

3. Let 7 1 : M 22 — »R and 7 2 : M 22 —► M 22 be the linear transformations given by 7 \ (_i4) = tr(j4) and 

t 2 (a^=a t . 

(a) Find (7 io 7 2 )(^), where A= a b } . 

c d 

(b) Can you find (7 2 o T \) (A) ? Explain. 

Answer: 

(a) « + <* 

(b) (7*2 o T\ ) (j 4) does not exist since T \ ( A ) is not a 2 x 2 matrix. 

4. Let T\\P n —> P n and T^.Pn—* he the linear operators given by T\ (p (x)) = p (x — 1 ) and 
^ 2 (p(x)) = p(x + 1). Find (7j o T 2 )(p(x)) and (7 2 oT\)(p(x)). 

5. Let T\\V —► V be the dilation 7 \ (v) = 4v. Find a linear operator T 2 .V —► V such that T\ o 7 2 = / 
7 2 oT\=I. 

Answer: 

r 2 (v) = 

6 . Suppose that the linear transformations T\.P 2 —*P 2 and T 2 .P 2 —* P 3 are given by the formulas 




T’lCpOO) — Pit + 1) and 7*2 0*00) = xp(x). Find ^2 ° T\ J^o + aix + tf 2 X 2 )- 

7. Let q$ (x) be a fixed polynomial of degree m, and define a function T with domain P n by the formula 
T(p(x)) = p(qo(x)). Show that T is a linear transformation. 

8. Use the definition of T 3 o T 2 o Tj given by Formula 3 to prove that 

(a) T 3 o Tj o T\ is a linear transformation. 

(b) 73 o 7*2 o T\ = (73 o T-i) o T\. 

(c) 73 o Tj o T\ = 73 o (72 o T\). 

9. Let T.R? —> R~' be the orthogonal projection of R-‘ onto the xy-plane. Show that T o T = 7- 

10. In each part, let p R~ be multiplication by A. Determine whether T has an inverse; if so, find 



11. In each part, let T:R S —» Pc' be multiplication by A. Determine whether T has an inverse; if so, find 



(a) 1 5 2“ 

A= 12 1 

-1 1 0 

(b) 14 -f 

A= 12 1 

-1 1 0 

(c) fl 0 r 

A— Oil 

1 1 0 

(d) [1-1 r 

A= 0 2-1 

2 3 0 


Answer: 


(a) T has no inverse. 



(b) 

■*l' 

x 2 

_*3_ 


1 1 3 

8 x i + r j- 4 X3 

r _1 

= 

8 x ‘ + 8 I2 + 4 13 

3 5 1 

-8 X1+ 8 X2 + 4 X3 




(c) 

'JCl' 

*2 

_*3_ 


2 X1_ 2 X2+ 2 X3 

7 -1 

= 

“2 X1+ 2 X2+ 2 X3 

2 X1 + 2 X2- 2 X3 

(d) 

'*l' 


3xi + 3X2-X3 

7 _1 

*2 

= 

—2xi — 2^2 + x 3 


*3 


—4xi — 5x2 + 2 x 3 


12. In each part, determine whether the linear operator T. R n —» R >: is one-to-one; if so, find 

T~ l {xi,x 2 .x M ). 

(a) FOl, x 2> x n) = (0, xi, x 2 , x„_i) 

(b) T ( X U x 2> *n) = ( x n. x n- 1. —. X2, xi) 

(c) T ( x l> x 2 , -» x M ) = (X2- *3> *l) 

13. Let T\R n —► /?” be the linear operator defined by the formula 

T(x i, *2, —, *w) = fa 1*1> a 2 x 2> —* «m^m) 

where a i, a n are constants. 

(a) Under what conditions will T have an inverse? 

(b) Assuming that the conditions determined in part (a) are satisfied, find a formula for 
r _1 (xi,X2 .-,x„). 

Answer: 

(a) a, * 0 for i = 1, 2, 3. n 

<b > r -1 **!. x 2> *3.= (£*l. 3j*3.... £*») 

14. Let T\ :R? —* R? an d Tn : R? —► 7 2 be the linear operators given by the formulas 

Tl( x ,y) = (. x +y> x -y) and 720,.y) = (2x+y,x-2y) 

(a) Show that T\ and T 2 are one-to-one. 

(b) Find formulas for 

77‘(*,.y), (T 2 oTt)-'( X ,y) 

(c) Verify that (J 2 oTi) -1 = Tf 1 o 7 2 _1 • 



15. Let T\ : P 2 — 5 ► and 7 * 2 : P 3 —► ^3 be the linear transformations given by the formulas 

^1 (*(*)) =**(*) T 2 (p(x)) =p(x + 1) 

(a) Find formulas for Jj -1 (j>( x )), 7" 1 (;>(*)), and (7 2 o 7’t ) -1 O(x)) • 

(b) Verify that (7 2 oT { )~ 1 = Tf 1 o T ^ 1 • 

Answer: 

<a) rr‘( f u)) = ^, 7y‘ o>w) =p(x - 1); (Tio^y'wx ))=£*(* - n 

16. Let T a F? —* P?> T%.R? —»and TfR? —* R?' be the reflections about the xy-plane, the ^z-plane, and 
the yz-plane, respectively. Verify Formula 8 for these linear operators. 

17. Let T:P\ —*R 2 be the function defined by the formula 

T(p(x)) = (p(0),p(\)) 

(a) Find T(1 — 2x). 

(b) Show that T is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Find ? 1 [2, 3 j, and sketch its graph. 

Answer: 

(a) 0 . - 1 ) 

(d) 7 - 1 (2, 3) = 2 + x 

18. Let T:R 2 —>R 2 be the linear operator given by the formula T(x, y) = (x ! ky, — y ) • Show that T is 
one-to-one and that = T f° r every real value of k. 

19. Prove: If T y _♦If is a one-to-one linear transformation, then T~^ R(t) —► V is a one-to-one linear 
transformation. 

In Exercises 20 -21, determine whether T\ o T 2 = T 2 o Ty 

20 . j. , p2 rJ. ^ the orthogonal projection on the x-axis, and 7 % 7 2 _+. r} is the orthogonal projection 
on the y-axis. 

(b) T { :R 2 -*R 2 is the rotation about the origin through an angle 9\ , and 7 % p} _* R 2 is the rotation 
about the origin through an angle 0 2 . 

(c) T\ R? —* R~' is the rotation about the x-axis through an angle 9\ , and Tv.B? —* ^3 j s the rotation 
about the z-axis through an angle 0 2 . 

21- (a) 7 *j • r2 _* 7 2 is the reflection about the x-axis, and T 2 .R 2 -*R 2 ’ s the reflection about the y-axis. 

(b) T\ R 2 —* R 2 i s the orthogonal projection on the x-axis, and T-> .R 2 —* R 2 is the counterclockwise 
rotation through an angle 9 ■ 



( c ) T\ :P? —* R~' is a dilation by a factor k, and J'- ! p^ p-‘ is the counterclockwise rotation about the 
z-axis through an angle 0. 


Answer: 


(a) T\oT 2 = T 2 oT\ 

(b) Ti oT 2 *T 2 oT\ 

(c) T\oT 2 = T 2 oT\ 


22. ( Calculus required) Let 



=/'oo 


and dffj= j j {L)dL 


be the linear transformations in Examples 11 and 12 of Section 8.1 . Find ( J o D) (f ) for 

(a) f (x) = x 2 + 3x + 2 

(b) f (*) = sin x 

(c) f (x) = e x + 3 


23. ( Calculus required) The Fundamental Theorem of Calculus implies that integration and differentiation 
reverse the actions of each other. Define a transformation D.P n —► P n -\ by D (P (*)) = P ( x ), and 

define J:P n -\ —*P n by 


J\p(x) 



p{t)dt 


(a) Show that D and J are linear transformations. 

(b) Explain why J is not the inverse transformation of D. 

(c) Can the domains and/or codomains of D and J be restricted so they are inverse linear transformations? 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) The composition of two linear transformations is also a linear transformation. 
Answer: 

True 

(b) If T\ : V —* V and T 2 .V —► V are any two linear operators, then T\oT 2 = T 2 oT\. 
Answer: 

False 

(c) The inverse of a linear transformation is a linear transformation. 


Answer: 


False 

(d) if a linear transformation T has an inverse, then the kernel of T is the zero subspace. 

Answer: 

True 

(e) If T:R 2 ^R 2 is the orthogonal projection onto the x-axis, then 7 * 1 R 1 maps each point on the 

x-axis onto a line that is perpendicular to the x-axis. 

Answer: 

False 

(I) If T \: U —» V and Tj .V —* W are linear transformations, and if T j is not one-to-one, then neither is 
T 2 oT v 

Answer: 

True 
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8.4 Matrices for General Linear Transformations 

In this section we will show that a general linear transformation from any ^-dimensional vector space V to any 
ra-dimensional vector space W can be performed using an appropriate matrix transformation from R n to R™. This idea is 
used in computer computations since computers are well suited for performing matrix computations. 


Matrices of Linear Transformations 


Suppose that V is an ^-dimensional vector space, W is an ra-dimensional vector space, and that T:V > W is a linear 
transformation. Suppose further that B is a basis for V, that B? is a basis for W, and that for each vector x in V, the 
coordinate matrices for x and T(x) are [x] £ and [T’(x) ] £*, respectively (Figure 8.4.1). 


A vector 
in V 

(n-dimenstonal) 

r 



A vector 
in/?" 

La 


Mb 



Figure 8.4.1 

It will be our goal to find an ^ x n matrix A such that multiplication by A maps the vector [x] £ into the vector [ 7*(x) ] £ f 
for each x in V (Figure 8.4.2a). If we can do so, then, as illustrated in Figure 8.4.2 b, we will be able to execute the linear 
transformation T by using matrix multiplication and the following indirect procedure: 


Finding T (x) Indirectly 


Compute the coordinate vector [x] £. 

Multiply [x] £ on the left by A to produce [T(x) ] £*. 
Reconstruct T(x) from its coordinate vector [T(x) ] £*. 


T maps 
V into tV r 






n 


T(\) 




Multiplication 
by/l 

maps R" into R m 


(«) 


Direct 


computation 


( 1 ) 


-► 7\x) 


( 3 ) 


Multiply by /t 

l x l« 




Figure 8.4.2 
























The key to executing this plan is to find an^ x « matrix A with the property that 


A[x] b =[T(x)] b > 


(1) 


For this purpose, let B = {u\, 112,u„) be a basis for the ^-dimensional space V and B r = ,j v\, V2, v m j: a basis for 
the m -dimensional space W. Since Equation 1 must hold for all vectors in V, it must hold, in particular, for the basis 
vectors in B\ that is, 


^[«i 1 b= [TXui)]*', a[m 2 ] b = [rfo)]*'. ^[u m ] b = [T(u„)] B ' 


( 2 ) 


But 


SO 


[«i]s = 


Y 


'o' 


‘o' 

0 


1 


0 

0 

. [ u 2 ] B = 

0 

. [«m]b = 

0 

0 


0 


1 



" < 311 

<312 • 

.. a\ n 

o o 


’<*11 ‘ 

^[ui] B = 

<321 

<322 - 

a 2n 

= 

<*21 


< 3 ml 

<3 m2 

•* ^mn 

0 


a m \ 


■ d [«2 ]b = 


<311 <312 

<»21 <322 

a m\ a m2 


<* In 
a 2 n 

^mn 


<*12 
<*22 

a m2 




<*11 <312 

<321 <322 

a m\ a m2 


<*lw 

<* 2 « 

^tnn 


<* 1 n 
<*2h 

<*tnn 


Substituting these results into 2 yields 


"<*11 " 


"< 312 _ 


<*1« 

<*21 

= [T(n 

& 

... to 
to 

= [r(u 2 )] B ',—, 

<*2 Yl 

a m \ 


a m2 


a mn 


= [T(u»)] b > 


which shows that the successive columns of A are the coordinate vectors of 

T(^),T(u 2 ) . T( u„) 

with respect to the basis B' ■ Thus, the matrix A that completes the link in Figure 8.4.2a is 


^=[[7’(ui)] S '|[r(u 2 )]B'|...|[7’(ti„)] B <] 


( 3 ) 



We will call this the matrix for T relative to the bases B and B and will denote it by the symbol [ T] £*£. Using this 
notation, Formula 3 can be written as 

[ [^T(ui) ][^"( 112 ) ] ^••|---| ] ^*] (4) 

and from 1, this matrix has the property 

\T\ B ' B \x] B = \T(x)] b * (5) 

We leave it as an exercise to show that in the special case where Tj±.R n —► R m is multiplication by A, and where B and B r 
are the standard bases for R n and R m , respectively, then 




( 6 ) 


Observe that in the notation [T] the right subscript is a basis for the domain of T, and the left subscript is 
a basis for the image space of T (Figure 8.4.3). Moreover, observe how the subscript B seems to “cancel out” in Formula 
5 (Figure 8.4.4). 




Basis for the Basis for the 
image space domain 

Figure 8.4.3 

Cancellation 
Figure 8.4.4 


EXAMPLE 1 Matrix for a Linear Transformation 

Let T. P\ — ► P2 be the linear transformation defined by 

T(p(x))=xp(x) 

Find the matrix for T with respect to the standard bases 

5 ={111,112} ^ 5 ' = {vi, v 2 , v 3 } 

where 

uj = 1, U2=x; vi = 1, V2=x, V3 = x 2 


From the given formula for T we obtain 

7(u0 = 7(1) = (x)(l)=* 

T(u 2 ) = T{x) = (x)(x)=x 2 

By inspection, the coordinate vectors for 7(ui) and T(u 2 ) relative to B r are 



'O' 


'O' 

[T( ui)]^ = 

1 

. [T(u 2 )] B ' = 

0 


0 


1 


Thus, the matrix for T with respect to B and B' is 


[T] B \B=[[n*i)]B'in*2)]B'] = 


0 

1 

0 


0 

0 

1 


EXAMPLE 2 The Three-Step Procedure 


Let T.P\ - be the linear transformation in Example 1, and use the three-step procedure described in 
the following figure to perform the computation 

„2 


T(a + &x J = x(a + bx j = ax +bx 1 


Direct 

computation 


T(x) 


(1) 


t Multiply by \T\ B - B 


l*l„ 


( 2 ) 


(3) 


ITT*)]*. 


Solution 

The coordinate matrix for x = a -f bx relative to the basis B = {1, x } is 

Multiplying [x] £ by the matrix [T] found in Example 1 we obtain 

"0 O' 

1 0 


[T\b\ b[ x ]b = 


0 1 


= [T(x)] b > 


Reconstructing T(x) = T(a 4- bx) from [T(x) ] B > we obtain 

T(a + bx J = 0 + ax 4- bx 2 = ax + bx 2 


Although Example 2 is simple, the procedure that it 
illustrates is applicable to problems of great 
complexity. 


EXAMPLE 3 Matrix for a Linear Transformation 


Let B B? —» R J be the linear transformation defined by 

*2 

T\ |_M1= -5^1 + 13x2 
—7xi + fbx2 


DSD- 


0 

-5 

-7 


1 

13 

16 


*l 

x 2 























Find the matrix for the transformation T with respect to the bases B = {n\ , 112 } for R 2 and 
B l = |vi, V 2 , V 3 ^ for where 



'3' 


"5' 


f 


'-f 


O' 

Ul = 

_1_ 

. u 2 = 

_2_ 

; vi = 

0 

-1 

. v 2 — 

2 

2 

> v 3 = 

1 

2 


From the formula for T, 



r 


2' 

n m) = 

-2 

- T(u 2 ) = 

1 


-5 


-3 


Expressing these vectors as linear combinations of v\, v 2 , and V 3 ? we obtain (verify) 

T(ui) =vi - 2 v 3 , T( u 2 ) =3vi +V 2 -V 3 


Thus, 


[T(ni)] B > = 

r 

0 

. [T(u 2 )] B ' = 

3' 

1 


-2 


-1 


so 


[T] B ' rB =[[T(u l )]B'\[n*2)]B'] = 


1 

0 

—2 


3 

1 

-1 


Example 3 illustrates that a fixed linear transformation generally has multiple representations, each depending 
on the bases chosen. In this case the matrices 




0 

f 


1 3" 

T 

— 

-5 

-7 

13 

16 

and [T]b',B = 

0 1 

-2 -1 


both represent the transformation T, the first relative to the standard bases for and the second relative to the bases 
B and B r stated in the example. 


Matrices of Linear Operators 

In the special case where Y = W (so that V — ► V is a linear operator), it is usual to take B = B r when constructing a 
matrix for T. In this case the resulting matrix is called the matrix for T relative to the basis B and is usually denoted by 
[ T] £ rather than [T] £ £. If B = {uj, u 2 ,..u M ) , then Formulas 4 and 5 become 

Phrased informally, Formulas 7 and 8 state that the 
matrix for T\ when multiplied by the coordinate 
vector for x, produces the coordinate vector for T(x) 




























[7] b =[[T(ui)] b |[T(u 2 )] b |...|[T(u„)] b ] 


( 7 ) 


[T] B [x] B =[T(x)] B 


( 8 ) 


In the special case where T:R ” —»is a matrix operator, say multiplication by A, and B is the standard basis for R n , 
then Formula 7 simplifies to 


[T] b =a 


(9) 


Matrices of Identity Operators 


Recall that the identity operator Y _► Y maps every vector in V into itself, that is, /(x) = x for every vector x in Y • The 
following example shows that if V is ^-dimensional, then the matrix for I relative to any basis B for V is the ^ x n identity 
matrix. 

EXAMPLE 4 Matrices of Identity Operators 

If B = (ui, U 2 ,..u„ } is a basis for a finite-dimensional vector space Y , and if /; Y —► V is the identity 
operator on Y, then 


/(ui)=ui, /(u 2 ) =U 2 .-., /(u„)=u„ 


Therefore, 


1 0 ... 0 

0 1 ... 0 


U]b= 0 0 ... 0 =1 


0 0 ... 1 


T t t 

[/(ui)] B [/(u 2 )] B [/( u„)] B 


EXAMPLES Linear Operator on P 2 


Let T: Pj —<► P2 he the linear operator defined by 

T(p( X ))=p( 3x-5) 



Find [ T] £ relative to the basis 5 = 11, x, x 2 1. 

Use the indirect procedure to compute 7^1 4- 2x -h 3x 2 J. 

Check the result in (b) by computing T 11 I 2x } 3x* j directly. 


Solution 




From the formula for T, 

7(l ) = 1, T(x) = 3x-5, 7^r 2 J = (3x — 5) 2 = 9x 2 — 30x + 25 


SO 


[T(\)] b = 


, [T(x)] b = 


-5 

3 

0 


• [4 2 )L= 


25 

-30 

9 


Thus, 


[T] b = 


1 -5 25 

0 3 -30 

0 0 9 


The coordinate matrix for p = 1 4- 2x 4- 3x 2 relative to the basis B — jl, x, x 2 j 

[p]fi = 


is 


Multiplying [p ] B by the matrix [ 7] £ found in part (a) we obtain 


[T] B [ p] B = 


1 

-5 

25 1 

\ 


66' 

0 

3 

-30 

2 

= 

-84 

0 

0 

9 

3 


27 


= [T{ p)]* 


Reconstructing 7jp j — 7^1 4 2x + 3x 2 j from [7(p) ] £ we obtain 
7(l -4 2x 4 3x 2 j = 66 — S4x + 27 x 2 

By direct computation, 

7(l + 2x + 3;r 2 ) = 1 + 2(3x - 5 ) + 3(3x - 5) 2 

= 1 4 6x - 10 4 27x 2 - 90* 475 
= 66 — 84x 4 27x 2 

which agrees with the result in (b). 


Matrices of Compositions and Inverse Transformations 

We will conclude this section by mentioning two theorems without proof that are generalizations of Formulas 4 and 7 of 
Section 4.10. 


THEOREM 8.4.1 

If T\ : U — ► V and T^.V —► W are linear transformations, and if B , B 9f , and B r are bases for U, V , and W, 
respectively, then 


















[7*2 O 7*1 ]s',B— [^2]b',B"[^i]b",B 


( 10 ) 


THEOREM 8.4.2 

If X: V —► V is a linear operator, and if 5 is a basis for V, then the following are equivalent. 

(a) T is one-to-one. 

(b) [ T] £ is invertible. 

Moreover, when these equivalent conditions hold, 


In 10, observe how the interior subscript B n (the basis for the intermediate space V) seems to “cancel out,” 
leaving only the bases for the domain and image space of the composition as subscripts (Figure 8.4.5). This cancellation 
of interior subscripts suggests the following extension of Formula 10 to compositions of three linear transformations 
(Figure 8.4.6): 


[T 3 o 7*2 oT\] B ' B = [Ti] B \ B **[T2]B f \B ft \T\]B i \B 


( 12 ) 




h 


Cancellation Cancellation 

Figure 8.4.5 

7*2 




Basis 8 


Basis 8 " 


Basis B"* 


Basis 8' 


Figure 8.4.6 


The following example illustrates Theorem 8.4.1. 

EXAMPLE 6 Composition 

Let Ti:Pi->P 2 ^the linear transformation defined by 

T\ =xp(x) 

and let To : Pi — ► Pi be the linear operator defined by 

T 2 (p(x))=p(3x-5) 

Then the composition (T 2 oT\) :P\ —>P2 is given by 


(T 2 O Ti)0>(*)) = T 2 (T l (p(x))) = T 2 (xp(x)) = (3x - 5)p(3x - 5) 
Thus, if p(x) =c o + then 

{T 2 oT\){cq + cix) = (3x-5)(c 0 + ci(3x-5)) 

= q ] (3x-5) + c 1 (3x-5) 2 


( 13 ) 


In this example, P\ plays the role of U in Theorem 8.4.1, and P 2 plays the roles of both V and W; thus we can 
take B r = B ,r in 10 so that the formula simplifies to 

[T 2 oTi] B * fB = [T2 ]b*[T\] b < b (14) 


Let us choose B = {1, x) to be the basis for P± and choose B* — |l, x, to be the basis for P 2 - We 


showed in Examples 1 and 5 that 


[T\]b',b = 

"0 O' 
1 0 

and [T2\b' = 

'1 -5 25' 

0 3 -30 


0 1 


0 0 9 


Thus, it follows from 14 that 


[T 2 o T\] b * b = 


'1 

-5 

25' 

0 

O' 


-5 

25' 

0 

3 

-30 

1 

0 

= 

3 

-30 

0 

0 

9 

0 

1 


0 

9 


(15) 


As a check, we will calculate [T 2 o T\ ] directly from Formula 4. Since B = (1, x) , it follows from 
Formula 4 with = 1 and 112 = x that 

[T 2 oTi] B * fB = [[(T 2 oTi)(\)] B *\[(T 2 oTi)(x)] B '] ( 16 ) 


Using 13 yields 

(t 2 O = 3x - 5 and (t 2 o = (3^ - 5) 2 = 9x 2 - 30x + 25 

From this and the fact that B 1 — (l, x, x 2 ) s it follows that 

[( 72070 ( 1 )]*.= 

Substituting in 16 yields 

[72 o 7i ] £< g = 3 —30 

0 9 


'-5' 


25' 

3 

and [(7 2 o7000]b' = 

-30 

0 


9 


-5 25' 



which agrees with 15. 


















Concept Review 

Matrix for a linear transformation relative to bases 
Matrix for a linear operator relative to a basis 
The three-step procedure for finding T(x) 

Skills 

Find the matrix for a linear transformation T: V —► W relative to bases of V and W. 

For a linear transformation X: V —► W find T(x) using the matrix for T relative to bases of V and W. 


Exercise Set 8.4 


1. Let T: 7 2 — 1 ► P2 be the linear transformation defined by T(p (x) ) = xp (x ). 

(a) Find the matrix for T relative to the standard bases 

S=|ui,U2, 113 1 and B r = |vi, v 2 , V3, V4| 


where 

ui = 1, u 2 = x, 113 = x 2 

VI = 1, V 2 =x, V 2 =x 2 , v 4 = x 3 

(b) Verify that the matrix [T] g obtained in part (a) satisfies Formula 5 for every vector x = C q _j_ c ^ x p in Pj 


Answer: 


(a) 


0 

1 

0 

0 


0 

0 

1 

0 


0 

0 

0 

1 


2. Let TP 2 —* P\ be the linear transformation defined by 

7^od-tfix + tf 2 x 2 J = + fax + 3a 2 Jx 

( a ) Find the matrix for T relative to the standard bases S=|l,x,x 2 j and B* = {1 , x | for P 2 and . 

(b) Verify that the matrix [T] £ { £ obtained in part (a) satisfies Formula 5 for every vector x = cq I c\x 1 C 2 X 1 m ?2 


3. Let T: P 2 —•► P 2 the linear operator defined by 

= ^o J-a \^x - 1J =Fa 2 (x “ l) 2 

( a ) Find the matrix for T relative to the standard basis B = * 2 j- for P 2 . 

(b) Verify that the matrix [T] £ obtained in part (a) satisfies Formula 8 for every vector x = a^ + a\x +a 2 X^ m P2- 


Answer: 




(a) 


1 -1 1 

0 1 -2 

0 0 1 


4. Let T.R 1 —*R 2 be the linear operator defined by 


CM::: 


~*2 
x 2 


and let B= { 111 , 112 ) be the basis for which 


ui = 


and U 2 = 


-1 

0 


(a) Find [T] B . 

(b) Verify that Formula 8 holds for every vector x 'mp?. 


5. Let T: R? —» R? be defined by 


*1 

x 2 


*1 + 2x2 
-x\ 

0 


(a) Find the matrix [ T] £ relative to the bases B = {ui, U2 ) and B ! = /vj, V2, V3 j., 



1 


_ 

2 

ui = 

3 

. u 2 = 


4_ 


T 


2 


VI = 

1 

. v 2 = 

2 

7 


1 


0 



, v 3 = 


(b) Verify that Formula 5 holds for every vector in p}. 

Answer: 

(a) | 0 0 

4 ’ 

8 4 
3 3 

6 . Let T P? —» F? be the linear operator defined by 

T ( x i,X 2 ,*3) = (*1 -7:3) 

(a) Find the matrix for T with respect to the basis B = { vj, v 2 , V 3 } , where 

vi = (1,0,1), v 2 = (0,1,1), v 3 = (1,1,0) 

(b) Verify that Formula 8 holds for every vector x = (x\, x 2 , * 3 ) in p-'. 

(c) Is T one-to-one? If so, find the matrix of 7' 1 with respect to the basis B. 

7. Let T: P 2 —* P 2 tbe linear operator defined by T{p (x) ) = p (2x + 1), that is, 

+ +c 2 tt 2 J =cq +ci 1 2x + 1J + c 2 (2x + 1)' 

( a ) Find [T] £ with respect to the basis B = -j 1, 1 ? j-. 


where 



(k) Use the three-step procedure illustrated in Example 2 to compute?" j2 — 3x I 4x 2 J. 
( c ) Check the result obtained in part (b) by computing 7^2 — 3x I 4x 2 j directly. 


Answer: 


(a) 


1 

0 

0 


1 1 
2 4 
0 4 


(b) 3 + 10* + 16;r 2 


8 . Let T:P 2 —•► P 2 the linear transformation defined by T(p(x)) = xp(x — 3), that is, 

( a ) Find [T] relative to the bases B = |l, x, x 2 J> and B' = ^\ 7 x, x 2 , x 2 j. 

0 3 ) Use the three-step procedure illustrated in Example 2 to compute 7" j 1 I x — x 2 J. 
( c ) Check the result obtained in part (b) by computing T{\ \ x — x 2 j directly. 


9. 


Let vi = 


1 

3 


and V 2 = 



, and let 



3 

5 


be the matrix for T.R 2 —>R 2 relative to the basis B = {vj, V 2 } . 

(a) Find [T(y\)] B and [T(y 2 )] B . 

(b) FindT(vi) andT(v 2 ). 

Find a formula for 7 ^ * j 
^ Use the formula obtained in (c) to compute T 



Answer: 


(a) [T(j i)] B = 

(b) T(r 1) = i 



1 

—2_ 


[T(v 2 )] b 




"-2 

. Uv 2 ) = 

_ 29 

18 

r 


1 


7 

■*l" 

107 

24 

_*2_ 

7 


7 



3 

5 


(d) 


19 

7 

83 

7 



10. [3-210 

Let A— 1 6 2 1 

-3 0 7 1 

B' — |wi, W2, W3 1 , where 


be the matrix for relative to the bases B = {vi, V2, V3, V4) and 



0 


2 


1 


6 


1 


1 


4 


9 

VI = 

i 

> V 2 = 

-1 

, v 3 = 

-1 

, V 4 = 

4 


1 


-1 



2 J 


2 


‘o' 


—7 


[-6 

- 



W1 = 

8 

, W 2 = 

8 

, W 3 = 

9 




8 


1 


1 




(a) Find [7(vi)] b *, [7’(v 2 )] £ ', [T(v 3 )] £ ', and [7(y 4 )] b *. 

(b) Find T(vi), 7(v 2 ), 7’(v 3 ), and 7(v 4 ). 

(c) 

Find a formula for T 


f[xf 


x 2 


x 3 


*4 

1 


(d) 


Use the formula obtained in (c) to compute T 


11 . 


Let A = 


1 3 -1 

2 0 5 

6-2 4 


be the matrix for T P 2 —*P 2 with respect to the basis B— { vi, V 2 , V 3 } , where 


vj = 2x 4- 3x 2 > v 2 = - 1 4- 2x + 2x 2 , v 3 = 3 + lx 4- 2x 2 - 

Find [r(vi)]£, [7(v 2 )] £ , and [T(v 3 )] 5 . 

(a) Find 7 ’(vi),7’(v 2), and T(V 3 ). 

(b) Find a formula for T ||ao + a 1 * 4- $2 X ~ )• 

( c ) Use the formula obtained in (c) to compute fo 1 I x 2 j. 
Answer: 


(a) 

T 


3" 


-r 

[T(y i)]* = 

2 

. [T(v 2 )] b = 

0 

. mr 2 )] B = 

5 


6 


-2 


4 


(b) T(vi) = 16 -H 5\x 4- 19x, T(v 2 ) = — 6 — 5t: 5x 2 , T(v 3 ) =7+ 40t:- h 15^^ 


( c ) t(< 

(d) T 


_ . _ .. ■ y _. 2 \ 239an — 161ai I 289 aj . 20lan — lllai I 247^2 61an - 31ai 1 107^2 

UQ f U\A ~! U 2A I 2^ i g A * 




22 + 56x + 14*" 


12. Let T\ : P i —* P 2 be the linear transformation defined by 

T’iOOO) =xpC0 

and let Tj : P 2 —*► linear operator defined by 



T 2 {p{x)) = p(2x + \) 

Let B = { 1 , x) and B* = jl, * 2 J- be the standard bases for P\ and ^ 2 - 

(a) Find [T 2 ° ^1 ] B',B> Ul ] and ITl ]£*',£• 

(b) State a formula relating the matrices in part (a). 

(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 

13. Let T\.P\ —► P 2 be the linear transformation defined by 

T\ Oo + ci*) = 2c 0 — 3c;i* 

and let 7 3 :72 —► 7 3 be b near transformation defined by 

T 2 {cq + c\x + c 2 x 2 ^ = 2cqx + 2c\x 2 + 3c 2 x 3 

Let 5= {1 ,x),B" = |l,ar,x 2 |,and B' = |l,x, x 2 , x 3 ). 

(a) Find [T 2 °T\] B > B , [^ 2 and [T\] B » B . 

(b) State a formula relating the matrices in part (a). 

(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 

Answer: 

(a) r° 0 ] To 0 o" 

[T2°T,) B .' B = * J , [T 2 ] B ’,B*= HI’ = 

0 0 0 0 3 

(b) [ 7*2 o Tj ] B ' B = [T2]b , ,B"\T\]b , ',B 

14. Show that if J 9 : V —► W is the zero transformation, then the matrix for T with respect to any bases for V and W is a zero 
matrix. 

15. Show that if p y _► y is a contraction or a dilation of V (Example 4) of Section 8.1), then the matrix for T relative to 
any basis for V is a positive scalar multiple of the identity matrix. 

16. Let B = (vi, V 2 , V 3 , V 4 ) be a basis for a vector space V. Find the matrix with respect to B of the linear operator 
7 : V -> V defined by 7(vi) =v 2 , 7(v 2 ) = v 3 , 7(v 3 ) = v 4 , 7(v 4 ) =vi- 

17. Prove that if B and B r are the standard bases for R }} and R m , respectively, then the matrix for a linear transformation 
7:7” —► R™ relative to the bases B and B' is the standard matrix for T. 

18. (Calculus required) Let D: P 2 —► 7*2 be differentiation operator D (p J = p* (x) . j n parts (a) and (b), find the 

matrix of D relative to the basis B= (p 1 , P 2 , P 3 } • 

( a ) PI = 1, P2 = x, V3 = x 

(b) pi = 2, p 2 = 2 — 3x, p 3 = 2 — 3x 4° 8 x 2 

( c ) Use the matrix in part (a) to compute B) \ 6 — 6x I 24* 2 J. 

(d) Repeat the directions for part (c) for the matrix in part (b). 

19. (Calculus required) In each part, suppose that B = {f 1 , f 2 , f 3 } is a basis for a subspace V of the vector space of 
real-valued functions defined on the real line. Find the matrix with respect to B for differentiation operator D\V >V- 

( a ) f 1 = 1 , f 2 = sin x, f 3 = cos x 

(b) f 1 = l, i 2 = e x , f 3 = e 2x 

(c) f 1= e 2x , i 2 = xe 2x , f 3 = x 2 e 2x 




w Use the matrix in part (c) to compute D\4e + 6xe — 10x e \ 


Answer: 


(a) 


(b) 


(c) 


0 0 

2 1 
0 2 
0 0 


0 

-1 

0 

0 

0 

2 

0 

2 

2 


(d) 

"2 1 O' 

4' 


14' 

\4e 2x - Sxe 2x - 20x 2 e 2x since 

0 2 2 

6 

= 

-8 


0 0 2 

-10 


-20 


20. Let V be a four-dimensional vector space with basis B, let W be a seven-dimensional vector space with basis B' , and let 
T .V —► W be a linear transformation. Identify the four vector spaces that contain the vectors at the comers of the 
accompanying diagram. 

Direct 


( 1 ) 


1 * 1 - 


compulation 

Multiply by 1 / ] fl B 

( 2 ) 


7U) 

(3) 


Figure Ex-20 


21. In each part, fill in the missing part of the equation. 

(a) [?2 ° T\]b',b= IT 2 ] - 2 - [T\]b",b 

(b) [?3 o 7*2 o 7*i ] g>g = [T 3 ] JL[ 7 ’ 2 ] s'" B",B 

Answer: 

(a) B', B" 

(b) B', B'" 

True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 


fa) 

v } If the matrix of a linear transformation X: V 
vector x in V such that T(x) = 2x. 

Answer: 


W relative to some bases of V and W is 


'2 


4 

3 


then there is a nonzero 


False 













If the matrix of a linear transformation X \V —► W relative to bases for V and W is 
vector x in V such that T (x) = 4x. 


, then there is a nonzero 


(b) 


2 4 
0 3 


Answer: 

False 

(c) 

v J If the matrix of a linear transformation X\ V 


W relative to certain bases for V and W is 


4 

3 


then T is one-to-one. 


Answer: 

True 

(d) If £ y _► V and X V —► V ar e linear operators and B is a basis for V, then the matrix of £ 0 X relative to B is 
[T] b [S] b . 


Answer: 

False 

(e) If X\ V —► V is an invertible linear operator and 5 is a basis for V, then the matrix for 7 7-1 relative to 5 is [ 7*] ^ . 


Answer: 

True 
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8.5 Similarity 

The matrix for a linear operator T: V-^V depends on the basis selected for V. One of the fundamental problems of linear 
algebra is to choose a basis for V that makes the matrix for T as simple as possible—a diagonal or a triangular matrix, for 
example. In this section we will study this problem. 


Simple Matrices for Linear Operators 


Standard bases do not necessarily produce the simplest matrices for linear operators. For example, consider the matrix 
operator T.p} —* F? whose standard matrix is 

11 

-2 4 


( 1 ) 


and view [ T] as the matrix for T relative to the standard basis B= { e j, e2} for/? 2 . Let us compare this to the matrix for 
T relative to the basis S r = <j uj , for /? 2 in which 


f 

T 

t 

T 

U 1 = 

l 

, U 2 = 

2 


( 2 ) 


Since 


tU\ = 


1 f 

T 


'2' 

= 2ut and tIvL ] = 

1 f 

V 


'3' 

2 4 . 

_ 1 _ 


_2_ 

w 

-2 4_ 

_2_ 


_6_ 


= 3\ln 


it follows that 

[ r ("i)]e' = 

so the matrix for T relative to the basis B? is 


[ 7 ( U 2 )L' = 


[T] b >=[T^) b ,T( 4) b ,] = 


2 0 
0 3 


This matrix, being diagonal, has a simpler form than [ T] and conveys clearly that the operator T scales uj by a factor of 2 
and iT, by a factor of 3, information that is not immediately evident from [T ]. 

One of the major themes in more advanced linear algebra courses is to determine the “simplest possible form” that can be 
obtained for the matrix of a linear operator by choosing the basis appropriately. Sometimes it is possible to obtain a 
diagonal matrix (as above, for example), whereas other times one must settle for a triangular matrix or some other form. 

We will only be able to touch on this important topic in this text. 

The problem of finding a basis that produces the simplest possible matrix for a linear operator X: V —► V can be attacked by 
first finding a matrix for T relative to any basis, typically a standard basis, where applicable, and then changing the basis in 
a way that simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts about changing bases. 


A New View of Transition Matrices 

Recall from Formulas 7 and 8 of Section 4.6 that if B = {uj, U 2 .u„} and . u 2 > • • -> u n } are bases for a vector 

space V, then the transition matrices from B to B f and from B r to B are 





























P B ^ B ' = [[ui]b1[u2]*1-|[u„] B ''] 


( 3 ) 


*V-,*=[|Xl] B |[ u 2] B | •|[ U «] B ] (4) 

where the matrices Pg .g* and Pg* f g are inverses of each other. We also showed in Formulas 9 and 10 of that section 
that if v is any vector in V, then 


p B^,B'\y^B = Mb' 

(5) 

? b'-»bWb'= M b 

(6) 


The following theorem shows that transition matrices in Formulas 3 and 4 can be viewed as matrices for identity operators. 


THEOREM 8.5.1 

If B and B? are bases for a finite-dimensional vector space V, and if /; y V is the identity operator on V, then 

P B—»B' = U] B\B ^ Pg'->g = V] B,B f 

Suppose that B = {uj, 112,..u„} and B* — | u i, X are bases for V. Using the fact that / (v) = v for all 

v in V, it follows from Formula 4 of Section 8.4 that 

= [ [^( u i)]b'|[^( u 2)]b # |---|[^( u m)]b # ] 

= [[ui] 5 '|[u 2 ]b'|—|[ u„] 5 '] 

= P [Formula (3) above] 

The proof that [1] = Pb*^>B is similar. 

Effect of Changing Bases on Matrices of Linear Operators 

We are now ready to consider the main problem in this section. 


PROBLEM 

If B and B* are two bases for a finite-dimensional vector space V, and if J 9 : y _► y is a linear operator, what 
relationship, if any, exists between the matrices [T] g and [T] g*l 


The answer to this question can be obtained by considering the composition of the three linear operators on V pictured in 
Figure 8.5.1. 


Basis = B' 


Basis = B 


Basis = B 


Basis B' 


Figure 8.5.1 

In this figure, v is first mapped into itself by the identity operator, then v is mapped into 7Xv) by T , and then ^(v) is 
mapped into itself by the identity operator. All four vector spaces involved in the composition are the same (namely, V), but 
the bases for the spaces vary. Since the starting vector is y and the final vector is T(v), the composition produces the same 
result as applying T directly; that is, 


T=loToI (7) 

If, as illustrated in Figure 8.5.1, if the first and last vector spaces are assigned the basis B f and the middle two spaces are 
assigned the basis B , then it follows from 7 and Formula 12 of Section 8.4 (with an appropriate adjustment to the names of 
the bases) that 


[T]b',B ,= [IoToI] B ',b'= 17] B' ,slT] B,bU] b,b' 


( 8 ) 


or, in simpler notation, 




(9) 


We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as 


[ t\ £••=Pb^,b ! t r] b^b'-iB 


( 10 ) 


In summary, we have the following theorem. 


THEOREM 8.5.2 

Let X: V —► V be a linear operator on a finite-dimensional vector space V, and let B and B f be bases for V. Then 

[T] B ' = P~\T] B P (11) 


where P = Pb'-+B and P 1 = Pb~*B { ' 


When applying Theorem 8.5.2, it is easy to forget whether P = P%* ^ (correct) or P = Pg (incorrect). It 
may help to use the diagram in Figure 8.5.2 and observe that the exterior subscripts of the transition matrices match the 
subscript of the matrix they enclose. 

[t] b . = P&-^n'[T] B P B 

\ 

Exterior subscripts 


Figure 8.5.2 


In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices representing the same linear operator 
relative to different bases must be similar. The following theorem is a rephrasing of Theorem 8.5.2 in the language of 
similarity. 


THEOREM 8.5.3 

Two matrices, A and B, are similar if and only if they represent the same linear operator. Moreover, if B = P~^AP, 
then P is the transition matrix from the basis relative to matrix B to the basis relative to matrix A. 


EXAMPLE 1 Similar Matrices Represent the Same Linear Operator 


We showed at the beginning of this section that the matrices 

2 0 “ 

0 3_ 

represent the same linear operator T.P? —* R?- Verify that these matrices are similar by finding a matrix P for 
which D = P~ l CP- 


C = 


1 1 
-2 4 


and D = 


We need to find the transition matrix 

where & — | u i > u 2 } is the basis for p} given by 2 and B = { e \ , 02 } is the standard basis for p}. We see by 
inspection that 

u[ = ei + ©2 
= ei + 2e2 


from which it follows that 



and [u'] B 


1 

2 


Thus, 


P = P B'^B= [[ U 1] B [ u 2] B 1 = 


1 1 
1 2 


We leave it for you to verify that 



and hence that 


o 

CM 


2 -1' 


r 1 n r 1 n 

.0 3 _ 


-1 1 


-2 4J [l 2_ 


D P~ { C P 





















Similarity Invariants 


Recall from Section 5.2 that a property of a square matrix is called a similarity invariant if that property is shared by all 
similar matrices. In Table 1 of that section (table reproduced below), we listed the most important similarity invariants. 
Since we know from Theorem 8.5.3 that two matrices are similar if and only if they represent the same linear operator 
V —► V-> it follows that if B and B r are bases for V, then every similarity invariant property of [T]g is also a similarity 
invariant property of [ T] £ } for any other basis B ' for V. For example, for any two bases B and B' we must have 

det( [T]g) = det( [ T] %*) 

It follows from this equation that the value of the determinant depends on T, but not on the particular basis that is used to 
obtain the matrix for T. Thus, the determinant can be regarded as a property of the linear operator T; indeed, if V is a finite- 
dimensional vector space, then we can define the determinant of the linear operator T to be 

detOO = det([7] B ) (12) 


where B is any basis for V. 


Similarity Invariants 


Property 

Description 

Determinant 

A and P~^AP have the same determinant. 

Invertibility 

A is invertible if and only if P~^AP is invertible. 

Rank 

A and p ~^AP have the same rank. 

Nullity 

A and p ~^AP have the same nullity. 

Trace 

A and p ~^AP have the same trace. 

Characteristic 

polynomial 

A and P~^AP have the same characteristic polynomial. 

Eigenvalues 

A and P~^AP have the same eigenvalues. 

Eigenspace 

dimension 

If A is an eigenvalue of A and AP, then the eigenspace of A corresponding to \ and the 

eigenspace of P~^AP corresponding to A have the same dimension. 


EXAMPLE 2 Determinant of a Linear Operator 


At the beginning of this section we showed that the matrices 

T H-2 4] “ d = 


2 0 
0 3 


represent the same linear operator relative to different bases, the first relative to the standard basis B = {ej, e 2 ) 
for p} and the second relative to the basis & = |uj, u 2 j for which 


t 

T 

/ 

T 

u i = 

1 

* u 2 = 

2 


This means that [T] and [T]g> must be similar matrices and hence must have the same similarity invariant 
properties. In particular, they must have the same determinant. We leave it for you to verify that 


det 


1 1 

-2 4 


= 6 and det[T]g' = 


2 0 
0 3 


= 6 

























EXAMPLE 3 Eigenvalues and Bases for Eigenspaces 


Find the eigenvalues and bases for the eigenspaces of the linear operator T. Pj —•► P 2 defined by 
T(a 4= bx + cx 2 J = — 2c (a =F 2b -Fcjx + ^ 3c Jx 2 


We leave it for you to show that the matrix for T with respect to the standard basis 

B= jl.x.x 2 ) is 


[T] b = 


0 

1 

1 


0 

2 

0 


-2 

1 

3 


The eigenvalues of T are A = ] and A = 2 (Example 7 of Section 5.1). Also from that example, the 
eigenspace of [T]g corresponding to A = 2 has the basis {uj, U 2 ) , where 


U 1 = 

'-r 

0 

> u 2 = 

"o' 

1 


1 


0 


and the eigenspace of [ T] £ corresponding to A = 1 has the basis (113 ) , where 

- 2 " 

113 = 1 

1 


The matrices uj, 113 , and 113 are the coordinate matrices relative to B of 

Pl = — 1 -h x 2 , P 2 = x, P3= “2 + x + ;r 2 

Thus, the eigenspace of T corresponding to A = 2 has the basis 

jpi,P2j = {“ 1 +x 2 ,*} 

and that corresponding to A = 1 has the basis 

j p 3 } = j-2+x+;c 2 j 

As a check, you can use the given formula for T to verify that 

7XPl)=2pi, 7Xp 2 ) = 2p2. and 7’(p 3 )=p3 


Concept Review 

Similarity of matrices representing a linear operator 
Similarity invariant 
Determinant of a linear operator 

Skills 

Show that two matrices A and B represent the same linear operator, and find a transition matrix P so that 

B = P~ l AP■ 










Find the eigenvalues and bases for the eigenspaces of a linear operator on a finite-dimensional vector space. 


Exercise Set 8.5 


In Exercises 1-7, find the matrix for T relative to the basis B , and use Theorem 8.5.2 to compute the matrix for T relative 
to the basis B f . 

1. T'.F? —► R 2 is defined by 


and 5= {111,112} and B* = |vi, V2^, where 


(T z »l 

)- 

XI - 2X2 

b\ 

I 

-X2 



T 


"o' 


" 2 " 


- 3 " 

ui = 

0 

> u 2 = 

1 

; vi = 

1 

> v 2 — 

4 


Answer: 



3 

56 

[T] b = 

"1 -2" 
0 -1 

II 

Cq 

11 

2 

11 

3 




11 

11 


/pf 


*1 +7x2 

b\ 

J" 

1 

OJ 

* 

1 

* 

to 


2. T:R 2 ^R 2 is defined by 


and 5= {111,112} and B r = |vi, V 2 |, where 

ui = 

3. T.R? —► R 2 is the rotation about the origin through an angle of 45°; B and B l are the bases in Exercise 1. 
Answer: 


" 2 " 


4" 


t 


-r 

_2_ 

. «2 = 

_-l_ 

. vi = 

_3_ 

, V 2 = 

_-i_ 


1 

1 


13 

25 

f2 

~f2 

. [Di' = 

ll/2 

ll/2 

1 

1 

5 

9 


£ 


ll/2 

ll/2 


[T] b = 

4. T ^3 ^ defined by 



[xf 

\ 


XI + 2x 2 — X3 

7 

*2 


= 

-X2 


*3 



x\ +7x3 


and B is the standard basis for and B* = |vi, V 2 , v 3 |, where 


T 


" 1 " 


"1" 

0 

0 

* v 2 = 

1 

0 

> V 3 = 

1 

1 


vi = 


















































5. T P? —» R~' is the orthogonal projection on the xjy-plane, and B and B' are as in Exercise 4. 


Answer: 



0 

0 
_1 


0 

0 

[T] b = 

0 1 0 

0 0 0 

II 

oq 

Ui 

0 1 1 

0 0 0 


6 'TP? —* P? > s defined by 7Xx) = 5x, and B and B r are the bases in Exercise 2. 

l.T P\ —* P\ is defined by T(a$+a\x) =< 2 q 4-tfi(x + 1), and 5= {pi , P2) and B 1 = jqj, q 2 j, where pi 
, P2 = 10 + 2x, qi = 2, q 2 = 3 + 2x. 


Answer: 


[T] b = 


[T] b > = 


1 '1 

0 lj 


8 . Find det(^). 

(a) 7 : ^ 2 _>^ 2 , where 7 ’(xi,X 2) = (3xi-4x2, -*1+7x2) 

(b) where 7 (xi,X2,x 3 ) = (xi -x 2 , x 2 -x 3 , x 3 -xi) 


(c) T:P 2 -*P 2 , where 7 (/>(x)) =p(x- 1) 


9. Prove that the following are similarity invariants: 

(a) rank 

(b) nullity 

(c) invertibility 


10. Let T.P 4 —> be the linear operator given by the formula T(p(x)) = p{ 2* + 1). 

(a) Find a matrix for T relative to some convenient basis, and then use it to find the rank and nullity of T. 

(b) Use the result in part (a) to determine whether T is one-to-one. 


11. In each part, find a basis for relative to which the matrix for T is diagonal. 

(a) 


(b) 


fT*il 

)_ 


N 

r 

2 *i +4*2 

fbf 


4xi — *2 

(hi 

r 

-3xi + *2 


Answer: 


1:11 B’ = | 

[-1]' [- 2 ]} 

(b) , 

-3-/2T 


6 

l 

1 


—3 + ^~2\ 
6 
1 


6 + 3* 


12. In each part, find a basis for R? relative to which the matrix for T is diagonal. 



(a) 

prf 

\ 

T 

*2 



|_*3_ 

/ 

(b) 

prf 

\ 

T 

*2 



|_*3_ 


(c) I 

T*l“ 


t\ 

x 2 


| 

*3 

/ 


- 2 xi+ X 2 - *3 
xi -2x2- *3 
— xi — X 2 - 2 x 3 
-x 2 +x 3 _ 

—xi + x 3 
xi *x 2 

4xi +*3 
2 xi + 3x2 + 2 x 3 
xi +4x3 


13. Let T: Pi —i> Pj be defined by 

T(ctQ + a\x+a2X 2 'j = (5 ciq-\- 6ai+ 2ct2) 

— ^i + 8 ( 22 ^ + ~ 2 <, 2 )x 2 

(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T. 


Answer: 


(a) A= -4. A = 3 

(b) Basis for eigenspace corresponding to A = — 4 
A = 3: 5 — 2x + x 2 


o 2 

— 2 4 - ^x 4 = x ; basis for eigenspace corresponding to 


14. Let T: Mji —♦ Af 22 defined by 

2 c a 4 =c 
b — 2 c d 

(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T. 

15. Let \ be an eigenvalue of a linear operator T: V —♦ V- Prove that the eigenvectors of T corresponding to \ are the 
nonzero vectors in the kernel of \[ — J\ 

(a) Prove that if A and B are similar matrices, then A 2 and g 2 are also similar. More generally, prove that A* and g^ 
are similar if k is any positive integer. 

(b) If A 2 and g 2 are similar, must A and B be similar? Explain. 



17. Let C and D be ^ x n matrices, and let B = {v\, V 2 ,v„) be a basis for a vector space V. Show that if 
C[x] g = D[x]g for all x in V, then C = D- 

18. Find two nonzero 2x2 matrices that are not similar, and explain why they are not. 

19. Complete the proof below by justifying each step. 


Hypothesis: A and B are similar matrices. 

Conclusion: A and B have the same characteristic polynomial. 
Proof: 

1- det^A/ — = det^A/ — P~^AP^ 



2. =det(AP _1 /> — 

3- = det(p _1 (A/-^)p) 

4. =det(p _1 )det(A/-^)det(p) 

5- =det(P" 1 )det(p)det(,V- J 4) 

6 = det(A/ — A) 

20. If A and B are similar matrices, say B = P -1 AP, then it follows from Exercise 19 that A and B have the same 
eigenvalues. Suppose that \ is one of the common eigenvalues and x is a corresponding eigenvector of A. See if you can 
find an eigenvector of B corresponding to A (expressed in terms of A, x, and P). 

21. Since the standard basis for R n is so simple, why would one want to represent a linear operator on R n in another basis? 
Answer: 

The choice of an appropriate basis can yield a better understanding of the linear operator. 

22. Prove that trace is a similarity invariant. 

True-False Exercises 

In parts (a)—(h) determine whether the statement is true or false, and justify your answer. 

(a) A matrix cannot be similar to itself. 

Answer: 

False 

(b) If A is similar to B , and B is similar to C, then A is similar to C. 

Answer: 

True 

(c) If A and B are similar and B is singular, then A is singular. 

Answer: 

True 

(d) If A and B are invertible and similar, then A -1 and B are similar. 

Answer: 

True 

(e) If T \: R n —> R n and 7^: R n — ► R n are linear operators, and if [ P\ ] b { ,B = [ ^2 ] B'',B with respect to two bases B and B f 
for R”, then T\ (x) = 7*2(x) for every vector x in R n . 

Answer: 

True 

(f) If T\ :R” — ► R n is a linear operator, and if [7^1]^= [ T\ ] g* with respect to two bases B and B* for R n , then B = B r • 


Answer: 


False 

(g) If T:R n — ► R n is a linear operator, and if [T]g = l n with respect to some basis B for R n , then T is the identity operator 
on R n . 

Answer: 

True 

(h) If T: R n —► R n is a linear operator, and if [T] B'\B = with respect to two bases B and B' for R n , then T is the identity 
operator on R n . 

Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Supplementary Exercises 


1. Let A be an n x n matrix, B a nonzero nxn matrix, and x a vector in R n expressed in matrix notation. Is 
7(x) = Ax. + B a linear operator on 7 ”? Justify your answer. 

Answer: 


No. 7(xi +X 2 ) =^(xi +X 2 ) + 7* (-(4xi + 7) + (-( 4 x 2 + 7) = 7(xi) + 7 (x 2 ),andifc* l,then 
7(cx) = cAx + B *c(Ax + B) = c7(x) . 


2. Let 



—sin 9 
cos 0 


(a) Show that 


cos 20 

—sin 20 

1 a2 

cos3 6 

—sin30 

sin 2 9 

cos 2 6 

and A = 

sin30 

cos3 9 


(b) Based on your answer to part (a), make a guess at the form of the matrix A n for any positive integer n. 

(c) By considering the geometric effect of multiplication by A, obtain the result in part (b) geometrically. 


3. Let 7 \V —» V be defined by 7(v) = ||v||v. Show that T is not a linear operator on V. 

4. Let vi, V 2 ,v m be fixed vectors in R n , and let T.R n —* R m be the function defined by 
7(x) = (x ■ vi, x • V 2 , x ■ v m ), where x • v, is the Euclidean inner product on R n . 

(a) Show that T is a linear transformation. 

(b) Show that the matrix with row vectors vi , V 2 , - - -, v m is the standard matrix for T. 

5. Let (ei, e 2 , e 3 , 04 ) be the standard basis for R 4 , and let T:R 4 —> R~' be the linear transformation for 
which 

7(eO = (1,2,1), 7(e 2 ) = (0.1,0). 

7(e 3 ) = d,3,0), 7 ( 84 ) = (1.1.1) 

(a) Find bases for the range and kernel of T. 

(b) Find the rank and nullity of T. 


Answer: 


(a) 7(e3) and any two of 7(ej), 7(e2), and 7(e4) form bases for the range; ( — 1, 1,0, 1) is a basis 
for the kernel. 

(b) Rank = 3 , nullity = 1 


6 . Suppose that vectors in R-' are denoted by 1 x 3 matrices, and define T.R? —* R. by 

-1 2 4 

7(|>i x 2 x 3 ]) = [xi x 2 x 3 ]| 3 0 1 

2 2 5 










(a) Find a basis for the kernel of T. 

(b) Find a basis for the range of T. 

7 . Let B = {vi, V2, V3, V4} be a basis for a vector space V, and let f/ - _> p - be the linear operator for 
which 

T(vi) = vi + V2 + V3 + 3 v4 
7’(v 2) = vj — V 2 4- 2v3 4- 2v4 

7 \ v 3 ) = 2vj — 4 v2 + 5v3 + 3v4 

T ( v 4 ) = — 2vi + 6V2 — 6V3 — 2v4 

(a) Find the rank and nullity of T. 

(b) Determine whether T is one-to-one. 


Answer: 


( a ) Rank(t) = 2 and nuUity(t) = 2 

(b) T is not one-to-one. 

8. Let V and W be vector spaces, let T, T\ , and T 2 be linear transformations from V to W, and let k be a scalar. 
Define new transformations, TyA-T 2 and £ 7 \ by the formulas 

(.Ty + T 2 )(x) = Ty(x) + T 2 (x) 

(kT) (x) =k(T(x)) 

(a) Show that ( Ty + T 2 ) :V —0 W and kT: V —* W are both linear transformations. 

(b) Show that the set of all linear transformations from V to W with the operations in part (a) is a vector 
space. 

9 . Let A and B be similar matrices. Prove: 

(a) A T and B T are similar. 

(b) If A and B are invertible, then A -1 and B -1 are similar. 


10. Fredholm Alternative Theorem Let T'.V ■ V be a linear operator on an ^-dimensional vector space. 
Prove that exactly one of the following statements holds: 

(i) The equation T(x) = b has a solution for all vectors b in V. 

(ii) Nullity of 7’>0- 


11. Let T: M 22 


il^22 be the linear operator defined by 


700 = 


1 

0 


x+x 


0 

1 


0 

1 


Find the ra nk and nullity of T. 


Answer: 

Rank = 3, nullity = 1 

12 . Prove: If A and B are similar matrices, and if B and C are also similar matrices, then A and C are similar 
matrices. 



13* Let L . M 22 • M 22 be the linear operator that is defined by B (M J — M ‘. Find the matrix for L with 

respect to the standard basis for M 22 - 

Answer: 


"l 0 0 O' 

0 0 10 
0 10 0 
0 0 0 1 

14 . Let B = (ui, U2, U3} and B ! = <j vi, V2, V3 j. be bases for a vector space V, and let 

2 -1 3 ' 

P= 1 14 

0 1 2 

be the transition matrix from B ! to B. 

(a) Express vj, V2, V3 as linear combinations of uj, U2, U3. 

(b) Express uj, U2, U3 as linear combinations of vj, V2, V3. 

15 . Let B = (uj, U2, U3} be a basis for a vector space V, and let T. V —► V be a linear operator for which 

'-3 4 T 

[T] b = 1 0 -2 

0 1 0 

Find [T] g>, where B' = J v 1, V2, V3 j, is the basis for V defined by 

vi=ui, V2 = ui+U2, V 3 = ui+U 2 +U 3 


Answer: 

-4 0 9 " 

[T] b >= 1 0 -2 

0 1 1 

16 . Show that the matrices 



are similar but that 



are not. 

17 . Suppose that J 7 ; V —* V is a linear operator, and B is a basis for V for which 

'x\ -X2+X3] r*r 

[7’(x)] b = x 2 if [x] B = x 2 

xi-x 3 J [x 3 _ 


Find [T] b . 



Answer: 


[T) B = 


1 

0 

1 


-1 1 

1 0 

0 -1 


18 . Let X: V —* V be a linear operator. Prove that T is one-to-one if and only if det(^) * 0 . 

19 . (Calculus required) 

( a ) Show that iff = f (x)is twice differentiable, then the function D C “ ( — oo, ocJ —*■F ( — oo, oc j 
defined by ~ (f ) = / " OO is a linear transformation. 


(b) Find a basis for the kernel of D. 

(c) Show that the set of functions satisfying the equation D( f) = f (x) is a two-dimensional subspace of 
C' 2 ( “ oo, 00 Y and find a basis for this subspace. 


Answer: 


(b) /OO =x. gOO = 1 

(c) f(x)=e x , g(x) = e~ x 

20. Let T.Pj — » R? be the function defined by the formula 


P(- 1 ) 

p( 0 ) 

*( 1 ) 


T(p(x)) = 

( a ) Find T(x 2 + 5 x + 6 j. 

(b) Show that T is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Find? 7 - 1 (o, 3, 0j. 

(e) Sketch the graph of the polynomial in part (d). 

21 . Let *i, *2, and *3 be distinct real numbers such that 

xi <X2<X2 

and let f. p-> —» R~‘ be the function defined by the formula 

p Oi) 

T(p(x)) = | p(x 2 ) 

P(x 3 ) 


(a) Show that T is a linear transformation. 

(b) Show that T is one-to-one. 



(c) Verify that if a l, & 2 , and 123 are any real numbers, then 


,-l 


\L 


a { 

«2 
a 3 


= a\P\(x) + a 3 P 3 0 ) +£ 3-^300 


where 


f,CO = 

(*1 “* 2)(*1 “* 3 ) 

ftfcO- 

(*2“*l) (*2 -*3) 

p -(->) = 0-*l)0-*2) 

' (*3-*l)(*3“*2) 

(d) What relationship exists between the graph of the function 

fll-PlO) + <3 2^ ? 2( ; 0 +a 3 p 3 (x) 

and the points (x 1, <at 1), (*2> a 2 )> an d (*3, ^3)? 


Answer: 


(b) The points are on the graph. 


22 . (Calculus required) Let p(x) and q(x) be continuous functions, and let V be the subspace of 
C( — 00, + 00) consisting of all twice differentiable functions. Define L .V —*V by 

L(y(x))=y"(.x) + p(x)y\x) +q(x)y(x) 

(a) Show that L is a linear transformation. 

(b) Consider the special case where p(x) = 0 and q(x) = 1 . Show that the function 

(j>(x) =cisinx + C2 C0S * 
is in the kernel of L for all real values of c\ and c 3 . 

23 . Calculus required Let D. P n —* P n be the differentiation operator - (P ) = P . Show that the matrix for D 
relative to the basis & = |l, x, x*,x ' |. is 

'0100 ... 0 " 

0 0 2 0 ... 0 

0 0 0 3 ... 0 

0 0 0 0 ... n 

0 0 0 0 ... 0 


24 . Calculus required It can be shown that for any real number c, the vectors 

1 r (*~c) 2 ( x-c)” 

’ ’ 2! n\ 

form a basis for P n . Find the matrix for the differentiation operator of Exercise 23 with respect to this 
basis. 

25 . Calculus required J P n ■ P^ 1 be the integration transformation defined by 



where p = «o +... + a„x n - Find the matrix for J with respect to the standard bases for P n and 

^M + l- 


Answer: 


0 0 0 
1 0 0 

1 

2 


0 

0 0 


■k 0 


1 

3 


0 0 0 


0 

0 

0 

0 


« + 1 
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CHAPTER CONTENTS 


L {/-Decompositions 
The Power Method 
9.3 Internet Search Engines 

Comparison of Procedures for Solving Linear Systems 
Singular Value Decomposition 

Data Compression Using Singular Value Decomposition 


INTRODUCTION 

This chapter is concerned with “numerical methods” of linear algebra, an area of study 
that encompasses techniques for solving large-scale linear systems and for finding 
numerical approximations of various kinds. It is not our objective to discuss algorithms 
and technical issues in fine detail, since there are many excellent books on the subject. 
Rather, we will be concerned with introducing some of the basic ideas and exploring 
important contemporary applications that rely heavily on numerical ideas—singular value 
decomposition and data compression. A computing utility such as MATLAB, 
Mathematica, or Maple is recommended for Section 9.2 to Section 9.6 . 
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9.1 Ll/-Decompositions 

Up to now, we have focused on two methods for solving linear systems, Gaussian elimination (reduction to row 
echelon form) and Gauss-Jordan elimination (reduction to reduced row echelon form). While these methods are 
fine for the small-scale problems in this text, they are not suitable for large-scale problems in which computer 
roundoff error, memory usage, and speed are concerns. In this section we will discuss a method for solving a linear 
system of n equations in n unknowns that is based on factoring its coefficient matrix into a product of lower and 
upper triangular matrices. This method, called “Z [/-decomposition,” is the basis for many computer algorithms in 
common use. 


Solving Linear Systems by Factoring 

Our first goal in this section is to show how to solve a linear system Ax = b °f « equations in n unknowns by 
factoring the coefficient matrix A into a product 


A = LU (1) 

where L is lower triangular and U is upper triangular. Once we understand how to do this, we will discuss how to 
obtain the factorization itself. 

Assuming that we have somehow obtained the factorization in 1, the linear system Ax = b can be solved by the 
following procedure, called LU-decomposition. 


The Method of /.(/-Decomposition 

Step 1. Rewrite the system Ax = b as 

LUx = h 

Step 2. Define a new « x 1 matrix v by 


Ux = y 


Step 3. Use 3 to rewrite 2 as Ly = b and solve this system for y. 
Step 4. Substitute y in 3 and solve for x . 


( 2 ) 


(3) 


This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system Ax = b by a pair of linear 
systems 

Ux = y 
Ly = b 

that must be solved in succession. However, since each of these systems has a triangular coefficient matrix, it 
generally turns out to involve no more computation to solve the two systems than to solve the original system 


directly. 


Solve Ax = b 


x 



Solve VS 



Figure 9.1.1 


h 


EXAMPLE 1 Solving Ax = b by /.(^-Decomposition 


Later in this section we will derive the factorization 


2 6 2' 


2 0 O' 


"1 3 f 

O 

00 

1 

on 

1 

= 

-3 1 0 


0 1 3 

4 9 2 


4-3 7 


0 0 1 

A 

— 

L 

U 


Use this result to solve the linear system 


2 6 2' 


"*f 


2 

O 

00 

1 

on 

1 

= 

x 2 


2 

4 9 2 


x 3 


3 

A 

— 

X 

b 


From 4 we can rewrite this system as 


2 0 O' 


"1 3 f 


‘*f 


'2' 

-3 1 0 


0 1 3 


x 2 

= 

2 

4-3 7 


0 0 1 


x 3 


3 


L U x = b 


(4) 


(5) 


In 1979 an important library of machine-independent linear algebra 
programs called LINPACK was developed at Argonne National Laboratories. Many of the 
programs in that library use the decomposition methods that we will study in this section. 
Variations of the LINPACK routines are used in many computer programs, including 
MATLAB, Mathematica, and Maple. 


As specified in Step 2 above, let us define yi, yj, and y 3 by the equation 


1 3 1 


■*r 


>1 

0 1 3 


x 2 

= 

72 

l 

O 

O 


*3_ 


73 

U 

X 

= 

y 


( 6 ) 


which allows us to rewrite 5 as 




























(V) 


2 0 O' 


71 


'2' 

-3 1 0 


72 

= 

2 

4-3 7 


73 


3 


L y = b 


or equivalently as 

2 yi =2 

-371+ 72 =2 

4yi - 372 + 773 = 3 

This system can be solved by a procedure that is similar to back substitution, except that we solve the 
equations from the top down instead of from the bottom up. This procedure, called forward 
substitution , yields 

y i = l, 72 = 5, 73 = 2 

(verify). As indicated in Step 4 above, we substitute these values into 6, which yields the linear 
system 


'1 3 f 


‘*f 


V 

0 1 3 


x 2 

= 

5 

O 

O 

i_ 


*3 


2 


or, equivalently, 

xi + 3x2 + *3=1 
X2 + 3x3 = 5 

X3 = 2 

Solving this system by back substitution yields 

x\ = 2 , X2 = - 1, *3 = 2 

(verify). 



Alan Mathison Turing (1912-1954) 

Although the ideas were known earlier, credit for popularizing the matrix 
formulation of the Z£/-decomposition is often given to the British mathematician Alan 
Turing for his work on the subject in 1948. Turing, one of the great geniuses of the twentieth 
century, is the founder of the field of artificial intelligence. Among his many 
accomplishments in that field, he developed the concept of an internally programmed 
computer before the practical technology had reached the point where the construction of 















such a machine was possible. During World War II Turing was secretly recruited by the 
British government's Code and Cypher School at Bletchley Park to help break the Nazi 
Enigma codes; it was Turing's statistical approach that provided the breakthrough. In addition 
to being a brilliant mathematician, Turing was a world-class runner who competed 
successfully with Olympic-level competition. Sadly, Turing, a homosexual, was tried and 
convicted of “gross indecency” in 1952, in violation of the then-existing British statutes. 
Depressed, he committed suicide at age 41 by eating an apple laced with cyanide. 

[Image: Time & Life Pictures/Getty Images, Inc.] 


Finding LU-Decompositions 

Example 1 makes it clear that after A is factored into lower and upper triangular matrices, the system Ax = b can 
be solved by one forward substitution and one back substitution. We will now show how to obtain such 
factorizations. We begin with some terminology. 


DEFINITION 1 

A factorization of a square matrix A as A = L £/, where L is lower triangular and U is upper triangular is 
called an LU-decomposition (or L U-factorization) of A. 


J 


Not every square matrix has an L [/-decomposition. However, we will see that if it is possible to reduce a square 
matrix A to row echelon form by Gaussian elimination without performing any row interchanges , then A will have 
an L ^-decomposition, though it may not be unique. To see why this is so, assume that A has been reduced to a row 
echelon form U using a sequence of row operations that does not include row interchanges. We know from 
Theorem 1.5.1 that these operations can be accomplished by multiplying A on the left by an appropriate sequence 
of elementary matrices; that is, there exist elementary matrices E\, E 2 , such that 


' S 2 E 1 A = U 


( 8 ) 


Since elementary matrices are invertible, we can solve 8 for A as 

A = EpBZ l • • -E^U 

or more briefly as 


A = LU 


(9) 


where 


L = EpEf l ■ • -eT 


(10) 


We now have all of the ingredients to prove the following result. 


THEOREM 9.1.1 

If A is a square matrix that can be reduced to a row echelon form U by Gaussian elimination without row 
interchanges, then A can be factored as A = LU> where L is a lower triangular matrix. 


Let L and £/be the matrices in Formulas 10 and 8, respectively. The matrix U is upper triangular because it 
is a row echelon form of a square matrix (so all entries below its main diagonal are zero). To prove that L is lower 
triangular, it suffices to prove that each factor on the right side of 10 is lower triangular, since Theorem 1.7.16 will 
then imply that L itself is lower triangular. Since row interchanges are excluded, each Ej results either by adding a 
scalar multiple of one row of an identity matrix to a row below or by multiplying one row of an identity matrix by a 
nonzero scalar. In either case, the resulting matrix Ej is lower triangular and hence so is E~^ by Theorem 1.7.Id. 

This completes the proof. 


EXAMPLE 2 An ^-Decomposition 


Find an Z£/-decomposition of 


A = 


2 6 2 
-3 -8 0 
4 9 2 


To obtain an ZL-decomposition, A = LU-> we will reduce A to a row echelon form U using Gaus! 
elimination and then calculate L from 10. The steps are as follows: 




Elemental 1 ) Matrix 

Rcductiun tu Currcspundinu to Imcrse of the 

Run Echelun Fur hi Run Operatiun the Run Operatiun Elementarx Matrix 


2 6 2 

-3 -8 0 

4 9 2. 







^ x row 1 


1 

2 

0 

o" 

__1 

"2 

0 

o" 

Step 1 






E\ - 

0 

i 

0 

Ei = 

0 

1 

0 








_0 

0 

1. 


0 

0 

lj 


i 

3 

f 












-3 

-8 

0 












4 

9 

2 




"1 

0 

0 



1 

0 0 

Step 2 
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(-4 x row 1) + row 3 E 3 = 

t 0 O' 

0 1 0 

Ey l = 

"1 0 O' 

0 1 0 



1- 

l 

o 

1_ 


4 0 1 














"i 

3 

r 









0 

1 

3 









.0 

-3 

2 













"i 

0 

0 


'i 

0 

Step 4 
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Step 5 
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and, from 10, 
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1 
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0 
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0 


0 

0 

1 

0 

0 

1 

4 

0 

1 

0 

-3 

1 

0 

0 
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2 0 0 
-3 1 0 

4-3 7 


2 6 2" 


2 0 O' 

-1 

oo 

_1 

-3 -8 0 

= 

-3 1 0 

0 1 3 

4 9 2 


4-3 7 

o 

o 

1_ 


is an L l -decomposition of A. 


Bookkeeping 

As Example 2 shows, most of the work in constructing an L [/-decomposition is expended in calculating L. 
However, all this work can be eliminated by some careful bookkeeping of the operations used to reduce A to U. 


Because we are assuming that no row interchanges are required to reduce A to U, there are only two types of 
operations involved—multiplying a row by a nonzero constant, and adding a scalar multiple of one row to another. 
The first operation is used to introduce the leading l's and the second to introduce zeros below the leading l's. 


In Example 2, a multiplier of was needed in Step 1 to introduce a leading 1 in the first row, and a multiplier of y 


was needed in Step 5 to introduce a leading 1 in the third row. No actual multiplier was required to introduce a 
leading 1 in the second row because it was already a 1 at the end of Step 2, but for convenience let us say that the 
multiplier was 1. Comparing these multipliers with the successive diagonal entries of L, we see that these diagonal 
entries are precisely the reciprocals of the multipliers used to construct U: 


L = 


2 0 0 
-3 1 0 

4-3 7 


( 11 ) 


Also observe in Example 2 that to introduce zeros below the leading 1 in the first row, we used the operations 

add 3 times the first row to the second 
add—4 times the first row to the third 


and to introduce the zero below the leading 1 in the second row, we used the operation 

add 3 times the second row to the third 

Now note in 12 that in each position below the main diagonal of L, the entry is the negative of the multiplier in the 
operation that introduced the zero in that position in U: 


L = 


2 

-3 

4 


0 0 
1 0 
-3 7 


( 12 ) 


This suggests the following procedure for constructing anL£/-decomposition of a square matrix^, assuming that 
this matrix can be reduced to row echelon form without row interchanges. 
























Procedure for Constructing an /.(/-Decomposition 

Step 1. Reduce A to a row echelon form U by Gaussian elimination without row interchanges, keeping 
track of the multipliers used to introduce the leading l's and the multipliers used to introduce the 
zeros below the leading l's. 

Step 2. In each position along the main diagonal of Z, place the reciprocal of the multiplier that introduced 
the leading 1 in that position in U. 

Step 3. In each position below the main diagonal of Z, place the negative of the multiplier used to 
introduce the zero in that position in U. 

Step 4. Form the decomposition A = LU- 


EXAMPLE 3 Constructing an /.^-Decomposition 


Find an /.( -decomposition of 



-2 0 
-1 1 
7 5 


We will reduce A to a row echelon form U and at each step we will fill in an entry of L in 
accordance with the four-step procedure above. 
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_3 
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Thus, we have constructed the Z£/-decomposition 
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A = LU = 

9 2 0 

3 8 1 
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0 
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• denotes an unknown 
entry of f.. 


No actual operation is 
performed here since 
there ts already a leading 
I in the third row. 


We leave it for you to confirm this end result by multiplying the factors. 


LU-Decompositions Are Not Unique 

In the absence of restrictions, Z£/-decompositions are not unique. For example, if 
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and L has nonzero diagonal entries, then we can shift the diagonal entries from the left factor to the right factor by 
writing 
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which is another Z£/-decomposition of A. 
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**23 
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LDU-Decompositions 


The method we have described for computing /^-decompositions may result in an “asymmetry” in that the matrix 
U has l’s on the main diagonal but L need not. However, if it is preferred to have Ts on the main diagonal of the 
lower triangular factor, then we can “shift” the diagonal entries of L to a diagonal matrix D and write L as 

L = L'D 


where V is a lower triangular matrix with l’s on the main diagonal. For example, a general 3x3 lower triangular 
matrix with nonzero entries on the main diagonal can be factored as 


<*11 

0 

0 ' 


1 

0 

o' 


<*11 

0 

0 ' 

<*21 

<*22 

0 

= 

<*21 tan 
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<*22 

0 

_<*31 

<*32 

<*33 


<*31 ta\\ 

<*32 t a 22 
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0 

0 

<*33 


L L' D 


Note that the columns of L ' are obtained by dividing each entry in the corresponding column of L by the diagonal 
entry in the column. Thus, for example, we can rewrite 4 as 


2 6 2' 


1 

to 

o 

o 

1- 

CO 

1_ 

-3 -8 0 

= 

-3 1 0 

0 1 3 

4 9 2 


4-3 7 

0 0 1 


1 - 
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"2 0 O' 

"1 3 f 

1 0 

0 1 0 

0 1 3 

- 1 

CO 

0 0 7 

0 0 1 


One can prove that if A is a square matrix that can be reduced to row echelon form without row interchanges, then 
A can be factored uniquely as 

A = LDU 

where L is a lower triangular matrix with Ts on the main diagonal, D is a diagonal matrix, and U is an upper 
triangular matrix with Ts on the main diagonal. This is called the LDU-decomposition (or LDU-factorization) of 
A. 


PL U-Decompositions 


Many computer algorithms for solving linear systems perform row interchanges to reduce roundoff error, in which 






























case the existence of an L//-decomposition is not guaranteed. However, it is possible to work around this problem 
by “preprocessing” the coefficient matrix A so that the row interchanges are performed prior to computing the 
Z//-decomposition itself More specifically, the idea is to create a matrix Q (called a permutation matrix ) by 
multiplying, in sequence, those elementary matrices that produce the row interchanges and then execute them by 
computing the product QA. This product can then be reduced to row echelon form without row interchanges, so it is 
assured to have an Z//-decomposition 


QA = LU (13) 

Because the matrix Q is invertible (being a product of elementary matrices), the systems Ax = b and Q^x — £?b 
will have the same solutions. But it follows from 13 that the latter system can be rewritten as LUx = Qh and hence 
can be solved using Z//-decomposition. 

It is common to see Equation 13 expressed as 


A = PLU 

in which P = Q . This is called a PLU-decomposition or ( PLU-factorization ) of A. 


(14) 


Concept Review 

Z //-decomposition 
LD //-decomposition 
PL //-decomposition 

Skills 

Determine whether a square matrix has an L //-decomposition. 
Find an L //-decomposition of a square matrix. 

Use the method of L //-decomposition to solve linear systems. 
Find the LD //-decomposition of a square matrix. 

Find a PL //-decomposition of a square matrix. 


Exercise Set 9.1 

1. Use the method of Example 1 and the L i -decomposition 


— 1 

oo 

1 

O'. 

_ 1 


3 O' 

-1 

1 

-2 5_ 


-1 

OJ 

1 

_i 

o 

1_ 


to solve the system 

3 xi — 6x2 = 0 

-2xi+ 5x2 = 1 








Answer: 


*1 =2, x 2 = 1 

2. Use the method of Example 1 and the LU-decomposition 

3 _6 -3i r 3 o oiri -2 -r 

206=2400 1 2 

—4 7 4j [-4 -1 2 J |_0 0 1 

to solve the system 

3xi — 6 x 2 — 3^3 = — 3 
2 xj + 6*3 = — 22 

— 4xi+7x2+ 4x3 = 3 

In Exercises 3-10, find an L i decomposition of the coefficient matrix, and then use the method of Example 1 to 
solve the system. 

3 . r 2 8pii = r- 2 ' 

-1 -lJ[^ 2 j“[- 2 _ 

Answer: 

xi = 3, X 2 = - 1 

4 j-5 -ioir*ii r- 10 ' 

6 sJL^J [ 19_ 

5 . r 2-2 -2][*i] [-4~ 

0 —2 2 *2 = —2 
-1 5 2_||/3j [ 6 

Answer: 

Xi = - 1, X2 = 1, X 3 = 0 

6 r _3 12 -6l[*n [- 33 ' 

1 -2 2 *2 = 7 

0 1 1 JI/ 3 J [ -1 

7 . r 5 5 10lr*ll [O' 

—8 -7 -9 *2 = 1 

0 4 26 J L x 3 J [4 

Answer: 

xi = - 1, X2= 1, X 3 = 0 

8 . [-1 -3 —411”^ 1 1 [-6 

3 10 _10 x 2 = —3 

-2 -4 llJ|/ 3 J [ 9 

9. [-1 0 1 0l[*il [ 5 

2 3 -2 6 *2 _ —1 

0-1 2 0 *3 3 

0 0 1 5\l x 4\ [ 7 



Answer: 


* 1 = -3, X 2=1, *3 = 2, *4=1 
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' 8 ' 

1 

2 

12 

0 

*2 
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0 

-1 

-4 

-5 

*3 


1 

0 

0 

2 

11 

x 4 


0 



(a) F ind an L ^/-decomposition of A. 

(b) Express A in the form A = L\DU\ , where L \ is lower triangular with 1 's along the main diagonal, U j is 
upper triangular, and D is a diagonal matrix. 

(c) Express A in the form A = L 2 U 2 , where £3 is lower triangular with l's along the main diagonal and U 2 is 
upper triangular. 


Answer: 


(a) 


A = LU = 


2 0 0 

-2 1 0 

2 0 1 


0 0 1 
0 0 1 


(b) 

'10 0 ' 

'2 0 0 ] 

A = L 1 DU 1 = 

-1 1 0 

1 0 1 

0 1 0 

0 0 lj 


(c) 

10 0“ 

'2 1 -r 

a = l 2 u 2 = 

-1 1 0 

0 0 1 


1 0 1 

0 0 1 


1 

2 
1 
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In Exercises 12-13, find an £.D [/-decomposition of A 
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A = 


13. 


A = 


2 2 

4 1_ 

3 -12 6 

0 2 0 

6 -28 13 


Answer: 
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O 

"3 0 0" 

"1 -4 2" 

A = 

0 1 0 

0 2 0 

0 1 0 


2 -2 1 

0 0 1 

0 0 1 


14 . 



(a) Show that the matrix 


0 1 
1 0 _ 

has no Ltd-decomposition. 

(b) Find a PL (-decomposition of this matrix. 

In Exercises 15-16, use the given PI-l -decomposition of A to solve the linear system Ax = b by rewriting it as 
p -1 Ax = P -1 b and solving this system by Ltd-decomposition. 


15. 
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1 4' 
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1 

; A = 
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0 0 
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1 ( 

D 

0 

1 
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0 

0 1 

3 

-5 

1 

0 

0 

17 


= PLU 


Answer: 

21 


14 


12 


* 1 — 17 ’ * 2 — ~ 17 ’ * 3 — 17 


16. 


b = 


; A = 


4 1 2 
0 2 1 
8 1 8 



l 

O 

o 

- 1 

o 

o 

"4 1 2' 

A = 

0 0 1 

2 1 0 

0-14 


0 1 o_ 

0 -2 1 

o\ 

o 

o 


= PLU 


In Exercises 17-18, find a PL Id-decomposition of A, and use it to solve the linear system Ax = b by the method 
of Exercises 15 and 16. 


17. 

"3 -1 0" 


- 2 " 

A = 

3 -1 1 

; b = 
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Csl 

O 

_i 
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Answer: 



O 
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A = 

0 0 1 

0 2 0 


0 1 0 

3 0 lj 





I 
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1 
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0 

1 
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* 1 = *2 = ^, *3 = 3 


18. 

"0 3 -2' 


1 

A = 

1 1 4 

2 2 5 

b = 

5 

-2 


A = 


a 

c 


b 

d 


19. Let 



(a) Prove: If a ^ 0? then the matrix A has a unique Z//-decomposition with l’s along the main diagonal of Z. 

(b) Find the Z//-decomposition described in part (a). 

Answer: 


a b 


'1 o' 

a b 

c d 

— 

S. 1 

I 

a 

o 



a 

a 


20. Let Ax = b he a linear system of n equations in n unknowns, and assume that A is an invertible matrix that can 
be reduced to row-echelon form without row interchanges. How many additions and multiplications are 
required to solve the system by the method of Example 1? 

21. Prove: If A is any n x n matrix, then A can be factored as A = PL U, where Z is lower triangular, U is upper 
triangular, and P can be obtained by interchanging the rows of l n appropriately. [Hint: Let U be a row echelon 
form of A, and let all row interchanges required in the reduction of A to C/be performed first.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) Every square matrix has an L //-decomposition. 

Answer: 

False 

(b) If a square matrix A is row equivalent to an upper triangular matrix (7, then A has an L //-decomposition. 
Answer: 

False 

(c) If L i, L 2 , - - L fc are n x n lower triangular matrices, then the product L\L2 m ■ • Zft is lower triangular. 


Answer: 

True 

(d) If a square matrix A has an L //-decomposition, then A has a unique LD //-decomposition. 
Answer: 

True 

(e) Every square matrix has a PL //-decomposition. 

Answer: 

True 
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9.2 The Power Method 

The eigenvalues of a square matrix can, in theory, be found by solving the characteristic equation. However, this 
procedure has so many computational difficulties that it is almost never used in applications. In this section we will 
discuss an algorithm that can be used to approximate the eigenvalue with greatest absolute value and a corresponding 
eigenvector. This particular eigenvalue and its corresponding eigenvectors are important because they arise naturally in 
many iterative processes. The methods we will study in this section have recently been used to create Internet search 
engines such as Google. We will discuss this application in the next section. 


The Power Method 

There are many applications in which some vector xq in R n is multiplied repeatedly by an n x n matrix A to produce a 
sequence 

xo, ^xo> -^ 2 X0 . j4 A xo, ... 

We call a sequence of this form a power sequence generated by A. In this section we will be concerned with the 
convergence of power sequences and how such sequences can be used to approximate eigenvalues and eigenvectors. 
For this purpose, we make the following definition. 


DEFINITION 1 

If the distinct eigenvalues of a matrix A are Ai, A 2 ,..Afc, and if |Ai | is larger than |A 2 1,..|Afc |, then Aj is 
called a dominant eigenvalue of A. Any eigenvector corresponding to a dominant eigenvalue is called a 
dominant eigenvector of A. 


EXAMPLE 1 Dominant Eigenvalues 

Some matrices have dominant eigenvalues and some do not. For example, if the distinct eigenvalues of a 
matrix are 

Ai = — 4, A 2 = — 2, A 3 = 1, A 4 = 3 

then Ai = — 4 is dominant since \X\ | = 4 is greater than the absolute values of all the other eigenvalues; 
but if the distinct eigenvalues of a matrix are 

A! =7, A 2 = —7, A 3 = - 2, A 4 = 5 

then |\\ | = |A 2 1 =7, so there is no eigenvalue whose absolute value is greater than the absolute value of 
all the other eigenvalues. 


The most important theorems about convergence of power sequences apply to n x n matrices with n linearly 
independent eigenvectors (symmetric matrices, for example), so we will limit our discussion to this case in this section. 


THEOREM 9.2.1 


* VI 

Let A be a symmetric « x « matrix with a positive dominant eigenvalue A If xq is a unit vector in R n that is 
not orthogonal to the eigenspace corresponding to A, then the normalized power sequence 


xn X1 _ x . _ x , _ Ack-1 

°’ 1 \\ axq \\’ X2 wmw . * iMxft-iir 

converges to a unit dominant eigenvector, and the sequence 

,4xi-xi, ^ x 2 " x 2» ^4x3 ■ X 3 ,^4x*-x*,... 


(1) 


( 2 ) 


converges to the dominant eigenvalue A. 


In the exercises we will ask you to show that 1 can also be expressed as 


xq, xi 


Ax\ 


£L 


IM*oll ’ 


*2 = 


' l4*oll 


*fc : 




\\A k x Q \\ 


(3) 


This form of the power sequence expresses each iterate in terms of the starting vector xo, rather than in terms of its 
predecessor. 


We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the 2 x 2 case where A is a symmetric 
matrix with distinct positive eigenvalues, Ai and A 2 , one of which is dominant. To be specific, assume that Ai is 
dominant and 

Ai > A 2 > 0 

Since we are assuming that A is symmetric and has distinct eigenvalues, it follows from Theorem 7.2.2 that the 
eigenspaces corresponding to Ai and A 2 are perpendicular lines through the origin. Thus, the assumption that xq is a 
unit vector that is not orthogonal to the eigenspace corresponding to X\ implies that xq does not lie in the eigenspace 
corresponding to A 2 To see the geometric effect of multiplying xq by A, it will be useful to split xq into the sum 

xq = vq 4 WQ (4) 

where vq and wq are the orthogonal projections of xq on the eigenspaces of Aj and A 2 , respectively (Figure 9.2.1a). 

A tV+A 2 w 0 



This enables us to express Ax .q as 


^4x 0 = Av 0 4 ^4 wq = Ai vq 4 A2WQ 


(5) 















which tells us that multiplying xq by A “scales” the terms vq and wq in 4 by X\ and A 2 , respectively. However, is 
larger than A2, so the scaling is greater in the direction of vq than in the direction of wq Thus, multiplying xq by A 
“pulls” xq toward the eigenspace of Ai, and normalizing produces a vector xi = ^xq / ||^ 1 xq II > which is on the unit 
circle and is closer to the eigenspace of X\ than xq (Figure 9.2. lb). Similarly, multiplying xi by A and normalizing 
produces a unit vector X 2 that is closer to the eigenspace of X\ than x^. Thus, it seems reasonable that by repeatedly 
multiplying by A and normalizing we will produce a sequence of vectors x^ that lie on the unit circle and converge to a 
unit vector x in the eigenspace of X\ (Figure 9.2.1c). Moreover, if x^ converges to x, then it also seems reasonable that 
Ax.fr • x.fr will converge to 

Ax ■ x = Ajx • x = Ai||x|| 2 = X\ 

which is the dominant eigenvalue of A. 


The Power Method with Euclidean Scaling 

Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue and a corresponding unit 
eigenvector of a symmetric matrix A, provided the dominant eigenvalue is positive. This algorithm, called the power 
method with Euclidean scaling , is as follows: 

r n 


The Power Method with Euclidean Scaling 

Step 1. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a unit vector xq . 

Step 2 . Compute Ax q and normalize it to obtain the first approximation xi to a dominant unit eigenvector. 
Compute Ax\ • xj to obtain the first approximation to the dominant eigenvalue. 

Step 3. Compute Ax\ and normalize it to obtain the second approximation X 2 to a dominant unit eigenvector. 
Compute Ax 2 ■ X 2 to obtain the second approximation to the dominant eigenvalue. 

Step 4. Compute Ax 2 and normalize it to obtain the third approximation X 3 to a dominant unit eigenvector. 
Compute Ax^ • X 3 to obtain the third approximation to the dominant eigenvalue. 

Continuing in this way will usually generate a sequence of better and better approximations to the dominant 

* 

eigenvalue and a corresponding unit eigenvector. 


J 


EXAMPLE 2 The Power Method with Euclidean Scaling 


Apply the power method with Euclidean scaling to 

r 

0 _ 

Stop at x _5 and compare the resulting approximations to the exact values of the dominant eigenvalue and 
eigenvector. 


A = 


3 2 
2 3 


with xq = 


We will leave it for you to show that the eigenvalues of A are A = 1 and A = 5 and that the 
eigenspace corresponding to the dominant eigenvalue A = 5 is the line represented by the parametric 
equations x \ = t,x 2 = C which we can write in vector form as 

1 " 

1 


x = t 


( 6 ) 








Setting t = 1 / ^2 yields the normalized dominant eigenvector 

1 


vi = 


1 

f2 


0.707106781187... 

0.707106781187... 


(7) 


Now let us see what happens when we use the power method, starting with the unit vector xq. 
Akq = 


-Axj t 
Ax 21 
Ax. 3 i 
Ax 4 ! 
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'3 

2 ' 

'0.71274' 
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v , _ Ax 3 
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'0.70824' 
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_0.70143_ 


_3.52976_ 

4 IIAsll 

~ 4.99985 

_3.52976_ 


_0.70597_ 
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2 ' 
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"3.53666" 

... Ax.4 

1 

"3.53666" 


'0.70733' 

2 

3_ 

_0.70597_ 


_3.53440_ 

5 WAxaW 

~ 4.99999 

_ 3.53440 _ 


_0.70688_ 


A (1 > = J ■ xj = (Ax { ) r xj « [ 3.60555 3.32820 ] 

A® = ^ 2 J'X2 = (^2) r x2« [3.56097 3.50445] 
A® = ^4x 3 j • x 3 = (^x 3 ) r x 3 « [ 3.54108 3.52976 ] 
A^ = fjfcJ • x 4 = (i4i4) r x 4 « [ 3.53666 3.53440 ] 
\®= \Ax 5 | -x 5 = (Ax 5 ) T x 5 » [3.53576 3.53531] 


0.83205 

0.55470 

0.73480 

0.67828 

0.71274 

0.70143 

0.70824 

0.70597 

0.70733 

0.70688 


: 4.84615 


: 4.99361 


: 4.99974 


; 4.99999 


: 5.00000 


Thus, approximates the dominant eigenvalue to five decimal place accuracy and x^ approximates the 
dominant eigenvector in 7 correctly to three decimal place accuracy. 


It is accidental that \(?) (the fifth approximation) 
produced five decimal place accuracy. In general, n 
iterations need not produce n decimal place 
accuracy. 


The Power Method with Maximum Entry Scaling 

There is a variation of the power method in which the iterates, rather than being normalized at each stage, are scaled to 
make the maximum entry 1. To describe this method, it will be convenient to denote the maximum absolute value of the 
entries in a vector x by max(x). Thus, for example, if 














































































X = 


5 
3 

-7 

2 

then max(x) = 7. We will need the following variation of Theorem 9.2.1. 


THEOREM 9.2.2 


Let A be a symmetric nxn matrix with a positive dominant* eigenvalue \ If xq is a nonzero vector in R n that 
is not orthogonal to the eigenspace corresponding to A, then the sequence 


XQ, 


xi =-^0- 

max(^xo) 


*2 = 


Ax i 

max(Axi) 


__ A *-i _ 

max(^x*_i) 


( 8 ) 


converges to an eigenvector corresponding to X, and the sequence 


Ax i • xi Axj ■ X2 Ax^ ■ X3 

xi -xi ’ X2-X2 ’ X3 • X3 ’ ’ 


Xfc-Xfc ’• 


( 9 ) 


converges to X. 


In the exercises we will ask you to show that 8 can be written in the alternative form 

Xl = _ 4zsi _ x , = __ Ik = _dha_ 

1 “^^o) ’ 2 m(A||]. 

which expresses the iterates in terms of the initial vector xq 


(10) 


We will omit the proof of this theorem, but if we accept that 8 converges to an eigenvector of A, then it is not hard to see 
why 9 converges to the dominant eigenvalue. For this purpose we note that each term in 9 is of the form 


Ax ■ x 

X ■ X 


( 11 ) 


which is called a Rayleigh quotient of A. In the case where X is an eigenvalue of A and x is a corresponding eigenvector, 
the Rayleigh quotient is 


.nX * X 
X ■ X 


Thus, if Xfc converges to a dominant eigenvector 


Ax-x _ A(x-x) 

X • X X • X 

then it seems reasonable that 


Xfc -Xft 


converges to 


Ahjl =x 

X • X 


which is the dominant eigenvalue. 


Theorem 9.2.2 produces the following algorithm, called the power method with maximum entry scaling. 

r n 


The Power Method with Maximum Entry Scaling 




















Step 1. Choose an arbitrary nonzero vector xq 

Step 2. Compute Axq and multiply it by the factor 1 / max(j4xo) to obtain the first approximation x\ to a 
dominant eigenvector. Compute the Rayleigh quotient of x\ to obtain the first approximation to the 
dominant eigenvalue. 

Step 3. Compute Ax\ and scale it by the factor 1 / max(^xi) to obtain the second approximation X 2 to a 
dominant eigenvector. Compute the Rayleigh quotient of X 2 to obtain the second approximation to the 
dominant eigenvalue. 

Step 4. Compute Ax 2 and scale it by the factor 1 / max(j 4 x 2 ) to obtain the third approximation X 3 to a 
dominant eigenvector. Compute the Rayleigh quotient of X 3 to obtain the third approximation to the 
dominant eigenvalue. 

Continuing in this way will generate a sequence of better and better approximations to the dominant 
eigenvalue and a corresponding eigenvector. 



John William Strutt Rayleigh (1842-1919) 


The British mathematical physicist John Rayleigh won the Nobel prize in physics in 1904 for 
his discovery of the inert gas argon. Rayleigh also made fundamental discoveries in acoustics and optics, and 
his work in wave phenomena enabled him to give the first accurate explanation of why the sky is blue. 

[Image: The Granger Collection, New York ;] 


EXAMPLE 3 Example 2 Revisited Using Maximum Entry Scaling 


Apply the power method with maximum entry scaling to 

r 

0 _ 

Stop at X 5 and compare the resulting approximations to the exact values and to the approximations 
obtained in Example 2. 


A = 


3 2 
2 3 


with xg = 


We leave it for you to confirm that 







T 


"3' 

v . _ -^xo _ 1 

"3' 


"1.00000 

0 


2 

1 max(-4x 0 ) 3 

2 


0.66667 


^4xi « 

'3 2 

[ 1.00000' 


'4.33333' 

-4xi 

1 

'4.33333' 


'1.00000' 

_2 3 

[o.66667 


4.00000_ 

max(ylxi) 

~ 4.33333 

_4.00000_ 


_0.92308_ 


"3 2 

1 [ 1.00000' 


"4.84615' 

xi - -4x 2 

1 

'4.84615' 


'1.00000' 

_2 3 

[o.92308_ 

Pa 

_4.76923_ 

max (.4x2) 

~ 4.84615 

_4.76923_ 


0.98413 

Ax 3 pa 

'3 2 

1 [ 1.00000' 


'4.96825' 

XI- - 4 x 3 

1 

'4.96825' 


'1.00000' 

_2 3 

J [o.98413_ 

Pa 

_4.95238_ 

max (- 4 x 3 ) 

~ 4.96825 

_4.95238_ 


0.99681 

Ax 4 pa 

'3 2 

1 [ 1.00000' 


'4.99361' 

X-- -4 x 4 

-. 1 

'4.99361' 


*1.00000* 

_2 3 

[o.99681 

Pa 

4.99042_ 

max (.4x4) 

~ 4.99361 

_4.99042_ 


_0.99936_ 


\ (1) _ -4xi ~xi (y4xi) r xi ^ 7.00000 

xi-xi T ~ 1.44444 

-M 

\C2) _ ^X9 -x? (Ax. 2 ) r x 2 _ 9.24852 

*2-* 2 xfx 2 ~ 185207 

\(3) Ati -x 3 (Axj 3 ) r x 3 _ 9.84203 
x 2" x 3 xfx 3 ~ 196851 

x (4) Ac4 • X 4 _ (^ 4 ) r x4 ^ 9.96808 
X 4-*4 xjx 4 ~ l " 362 

x(5) _ ^xs -xs _ (^Xj) r xj 9.99360 

A x 5 -x 5 T ~ 1.99872 


pa4.84615 

pa4.99361 

pa 4.99974 

» 4.99999 

ps 5.00000 


Thus, \(?) approximates the dominant eigenvalue correctly to five decimal places and closely 
approximates the dominant eigenvector 

r 


that results by taking { — \ in 6. 


Whereas the power method with Euclidean scaling 
produces a sequence that approaches a unit 
dominant eigenvector, maximum entry scaling 
produces a sequence that approaches an eigenvector 
whose largest component is 1. 


Rate of Convergence 

If A is a symmetric matrix whose distinct eigenvalues can be arranged so that 

|Ai|>|A 2 |>|A 3 |>...>|Afc| 

then the “rate” at which the Rayleigh quotients converge to the dominant eigenvalue X\ depends on the ratio |Ai | / |A 2 1; 
that is, the convergence is slow when this ratio is near 1 and rapid when it is large—the greater the ratio, the more rapid 
the convergence. For example, if A is a 2 x 2 symmetric matrix, then the greater the ratio |Ai | / |A 2 |, the greater the 





































































disparity between the scaling effects of Ai and A 2 in Figure 9.2.1, and hence the greater the effect that multiplication by 
A has on pulling the iterates toward the eigenspace of . Indeed, the rapid convergence in Example 3 is due to the fact 
that |Ai | / |A2| = 5/1=5, which is considered to be a large ratio. In cases where the ratio is close to 1, the 
convergence of the power method may be so slow that other methods must be used. 


Stopping Procedures 

If A is the exact value of the dominant eigenvalue, and if a power method produces the approximation \(& at the Ml 
iteration, then we call 


( 12 ) 


the relative error in A (*0. If this is expressed as a percentage, then it is called the percentage error in A (*) For 
example, if A = 5 and the approximation after three iterations is A® = 5.1, then 


A-A^ 


5-5.1 

A 


5 


relative error in A® = 


percentage error in A® = 0.02 x 100% = 2% 


= | — 0 . 02 | = 0.02 


In applications one usually knows the relative error E that can be tolerated in the dominant eigenvalue, so the goal is to 
stop computing iterates once the relative error in the approximation to that eigenvalue is less than E. However, there is a 
problem in computing the relative error from 12 in that the eigenvalue A is unknown. To circumvent this problem, it is 
usual to estimate A by A® and stop the computations when 


A (*) A (*-1) 


A (*) 


<E 


(13) 


The quantity on the left side of 13 is called the estimated relative error in A (*0 and its percentage form is called the 
estimated percentage error in \( k X 

EXAMPLE 4 Estimated Relative Error 

For the computations in Example 3, find the smallest value of k for which the estimated percentage error 
in is less than 0.1%. 


The estimated percentage errors in the approximations in Example 3 are as follows: 


APPROXIMATION 


A©: 

A^: 

A»: 

A®: 


A®-A® 


4.99361 — 4 84615 

A© 


4.99361 

A^-A^ 


4.99974-4.99361 

A^ 


4.99974 

A®-A® 


4.99999-4.99974 

A® 


4.99999 

A® — A® 


5.00000-4.99999 

A® 


5.00000 


RELATIVE PERCENTAGE 
ERROR ERROR 

« 0.02953 = 2.953% 

« 0.00123 = 0.123% 

w 0.00005 = 0.005% 


0.00000 = 0 % 






































Thus, \($ = 4.99999 is the first approximation whose estimated percentage error is less than 0.1%. 


A rule for deciding when to stop an iterative process is called a stopping procedure. In the exercises, we will 
discuss stopping procedures for the power method that are based on the dominant eigenvector rather than the dominant 
eigenvalue. 


Concept Review 

Power sequence 

Dominant eigenvalue 

Dominant eigenvector 

Power method with Euclidean scaling 

Rayleigh quotient 

Power method with maximum entry scaling 

Relative error 

Percentage error 

Estimated relative error 

Estimated percentage error 

Stopping procedure 

Skills 

Identify the dominant eigenvalue of a matrix. 

Use the power methods described in this section to approximate a dominant eigenvector. 
Find the estimated relative and percentage errors associated with the power methods. 


Exercise Set 9.2 

In Exercises 1-2, the distinct eigenvalues of a matrix are given. Determine whether A has a dominant eigenvalue, and 
if so, find it. 

L (a) Ai=7, A 2 = 3, A 3 = -8, A 4 =l 
(b) Ai = - 5. A 2 = 3, A 3 = 2, A 4 = 5 

Answer: 

(a) A 3 dominant 

(b) No dominant eigenvalue 

2 . 


(a) Ai = 1, A 2 = 0, A 3 = — 3, A 4 = 2 

(b) Aj = — 3, A 2 = - 2, A 3 = - 1, A 4 = 3 


In Exercises 3-4, apply the power method with Euclidean scaling to the matrix A, starting with xq and stopping at X4. 
Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding unit 
eigenvector. 

1 " 

0 _ 

Answer: 




0.98058' 


0.98837' 


0.98679' 


0.98715' 

XI £2 

-0.19612 

; 

-0.15206 

; 

-0.16201 

; x 4 « 

-0.15977 


dominant eigenvalue: A = 2 + /To « 5.16228; 


dominant eigenvector: 


3 — /To’ 


1 

-0.16228 


4. 

1 

1 

O' 


T 

A = 

-2 6 

-2 

; x 0 = 

0 


0 -2 

5 


0 


In Exercises 5-6, apply the power method with maximum entry scaling to the matrix A, starting with xo and stopping 
at X 4 . Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding 
scaled eigenvector. 


5. 


A = 


1 -3 
-3 5 


x 0 = 


Answer: 

xi = 

X4 sa 


-1 

1 

-0.53488 

1 


-0.5 

1 


A® = 6; x 2 = 

, « 6.60555; 

dominant eigenvalue: A = 3 + /TJ « 6.60555; 

3 


A® = 6.6, X3: 


-0.53846 

1 


dominant eigenvector: 


^26 + 4^ 13 

2+/I3 

1/26+4/13 


-0.47186 

0.88167 


6. 

'3 2 2' 


A = 

2 2 0 

; x 0 = 


2 0 4 



7. Let 


A = 


2 -1 

-1 2 


x 0 = 


, A® « 6.60550, 



(a) Use the power method with maximum entry scaling to approximate a dominant eigenvector of A. Start with xq, 
round off all computations to three decimal places, and stop after three iterations. 

(b) Use the result in part (a) and the Rayleigh quotient to approximate the dominant eigenvalue of A. 

(c) Find the exact values of the eigenvector and eigenvalue approximated in parts (a) and (b). 

(d) Find the percentage error in the approximation of the dominant eigenvalue. 


Answer: 


1 


1 


i 

0.5_ 

. x 2 = 

0.8_ 

, x 3 « 

0.929_ 


(b) A (1 > = 2.8, A® » 2.976, A® « 2.997 
fc) 

v } Dominant eigenvalue: \ = 3; dominant eigenvector: 
(d) 0 . 1 % 


8. Repeat the directions of Exercise 7 with 



"2 1 O' 


T 

A = 

1 2 0 

; x 0 = 

1 


O 

o 

o 


1 


In Exercises 9-10, a matrix A with a dominant eigenvalue and a sequence xq, y are given. Use Formulas 

9 and 10 to approximate the dominant eigenvalue and a corresponding eigenvector. 


'1 2' 

.2 1. 

; *o = 

T 

_o_ 

, = 

"f 

_2_ 

, A 2 xq = 

~5~ 

_4_ 


'13' 

.4 

'41' 


121' 

_!4_ 

, A XQ = 

_40_ 

, A xo = 

_ 122 _ 


Answer: 
2.99993; 
A = 


0.99180 

1.00000 


'1 2' 

_2 1_ 

; x 0 = 

'o' 

_1_ 

, ^XQ = 

'2' 

_1_ 

, A 2 xq = 

'4' 

_5_ 


10 . 


A xo = 

11. Consider matrices 


'14' 

/i4 

"40' 

/i5 

122' 

_13_ 

, A XQ = 

_41_ 

, ^4 xo = 

121 


A = 


-1 0 

0 0 


and xq = 


where xq is a unit vector and a & 0- Show that even though the matrix A is symmetric and has a dominant 
eigenvalue, the power sequence 1 in Theorem 9.2.1 does not converge. This shows that the requirement in that 
theorem that the dominant eigenvalue be positive is essential. 


12. Use the power method with Euclidean scaling to approximate the dominant eigenvalue and a corresponding 

eigenvector of A. Choose your own starting vector, and stop when the estimated percentage error in the eigenvalue 
approximation is less than 0 . 1 %. 



(a) r 1 3 3 

3 4-1 

3 -1 10 

(b) 1 0 11" 

0 2—11 

1 -1 4 1 

1118 

13. Repeat Exercise 12, but this time stop when all corresponding entries in two successive eigenvector approximations 
differ by less than 0.01 in absolute value. 

Answer: 


Starting with 0 , it takes 8 iterations. 

0 


Starting with , it takes 8 iterations. 


14. Repeat Exercise 12 using maximum entry scaling. 

15. Prove: If A is a nonzero nxn matrix, then and AA T have positive dominant eigenvalues. 

16. (For readers familiar with proof by induction) Let A be an n x n matrix, let xq be a unit vector in R ”, and define 
the sequence x\ , X 2 ,..x^,... by 

Axn Ax i Axjr-\ 

X1 - ll^oll ’ X2_ IMxill. Xk ~ Mxfc-lH’'" 

Prove by induction that = A k XQf ||.<4*xo||- 

17. (For readers familiar with proof by induction) Let A be an n x n matrix, let xq be a nonzero vector in R ”, and 

define the sequence xi, X 2 ,..Xfc,... by 

Ax n Ax] Ax ir i 

x l = - Ta^T* x 2 =- /It . • • •> *k = - , ? 1 s , • ■ ■ 

max(^xo) max(j5xi) max(^x^-l) 

Prove by induction that 


max l4*xq 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



9.3 Internet Search Engines 

Early search engines on the Internet worked by examining key words and phrases in pages and titles of posted documents. Today's most popular search engines use algorithms 
based on the power method to analyze hyperlinks (references) between documents. In this section we will discuss one of the ways in which this is done. 

Google, the most widely used engine for searching the Internet, was developed in 1996 by Larry Page and Sergey Brin while both were graduate students at Stanford University. 
Google uses a procedure known as the PageRank algorithm to analyze how documents at relevant sites reference one another. It then assigns to each site a PageRank score, 
stores those scores as a matrix, and uses the components of the dominant eigenvector of that matrix to establish the relative importance of the sites to the search. 

Google starts by using a standard text-based search engine to find an initial set Sg of sites containing relevant pages. Since words can have multiple meanings, the set Sg will 
typically contain irrelevant sites and miss others of relevance. To compensate for this, the set Sg is expanded to a larger set S by adjoining all sites referenced by the pages in the 
sites of Sq. The underlying assumption is that S will contain the most important sites relevant to the search. This process is then repeated a number of times to refine the search 
information still further. 

To be more specific, suppose that the search set S contains n sites, and define the adjacency matrix for S to be the matrix A = [a^ ] in which 

ajj = 1 if site i references site j 

ay = 0 if site i does not reference site j 

We will assume that no site references itself, so the diagonal entries of A will all be zero. 


EXAMPLE 1 Adjacency Matrices 


Here is a typical adjacency matrix for a search set with four sites: 

Referenced Site 

12 3 4 

[0 0 1 ll 1 
^“1000 2 
10 0 1 3 

1110 4 


Referencing Site 


Thus, Site 1 references Sites 3 and 4, Site 2 references Site 1, and so forth. 


( 1 ) 


There are two basic roles that a site can play in the search process—the site may be a hub, meaning that it references many other sites, or it may be an authority, meaning that it 
is referenced by many other sites. A given site will typically have both hub and authority properties in that it will both reference and be referenced. 

The term google is a variation of the word googol, which stands for the number 10*^ (1 followed by 100 zeros). This term was invented by the 
American mathematician Edward Kasner (1878-1955) in 1938, and the story goes that it came about when Kasner asked his eight-year-old nephew to give a name to a 
really big number—he responded with “googol.” Kasner then went on to define a googolplex to be I0 googo1 (1 followed by googol zeros). 


In general, if ^4 is an adjacency matrix for n sites, then the column sums of A measure the authority aspect of the sites and the row sums of A measure their hub aspect. For 
example, the column sums of the matrix in 1 are 3, 1,2, and 2, which means that Site 1 is referenced by three other sites, Site 2 is referenced by one other site, and so forth. 
Similarly, the row sums of the matrix in 1 are 2, 1,2, and 3, so Site 1 references two other sites, Site 2 references one other site, and so forth. 

Accordingly, if A is an adjacency matrix, then we call the vector hg of row sums of A the initial hub vector of A, and we call the vector ag of column sums of A the initial 
authority vector of A. Alternatively, we can think of ag as the vector of row sums of , which turns out to be more convenient for computations. The entries in the hub vector 
are called hub weights and those in the authority vector authority weights. 


EXAMPLE 2 Initial Hub and Authority Vectors of an Adjacency Matrix 

Find the initial hub and authority vectors for the adjacency matrix^ in Example 1. 

The row sums of A yield the initial hub vector 


hg = 


Site 1 
Site 2 
Site 3 
Site 4 


and the row sums of A ~ (the column sums of A) yield the initial authority vector 

3 
1 
2 
2 


ag = 


Site 1 
Site 2 
Site 3 
Site 4 


( 2 ) 


(3) 


The link counting in Example 2 suggests that Site 4 is the major hub and Site 1 is the greatest authority. However, counting links does not tell the whole story; for example, it 
seems reasonable that if Site 1 is to be considered the greatest authority, then more weight should be given to hubs that link to that site, and if Site 4 is to be considered a major 








hub, then more weight should be given to sites to which it links. Thus, there is an interaction between hubs and authorities that needs to be accounted for in the search process. 
Accordingly, once the search engine has calculated the initial authority vector ag, it then uses the information in that vector to create new hub and authority vectors hi and ai 
using the formulas 


hi = 


Aan 

UsqW 


and 


A 7 hi 

Uphill 


( 4 ) 


The numerators in these formulas do the weighting, and the normalization serves to control the size of the entries. To understand how the numerators accomplish the weighting, 
view the product Aag as a linear combination of the column vectors of A with coefficients from ag. For example, with the adjacency matrix in Example 1 and the authority vector 
calculated in Example 2 we have 

Referenced Site 


12 3 4 


"o 

0 

1 

1 

3 


0 


0 


1 


1 


4 

Site 1 

1 

0 

0 

0 

1 

_ 7 

1 

+ 1 

0 

+ 2 

0 

+ 2 

0 


3 

Site 2 

1 

0 

0 

1 

2 

— 3 

1 

0 

0 

1 


5 

Site 3 

1 

1 

1 

0 

2 


1 


1 


1 


0 


6 

Site 4 


Thus, we see that the links to each referenced site are weighted by the authority values in ag To control the size of the entries, the search engine normalizes Azq to produce the 
updated hub vector 


hi = 


^4an _ 1 

ll^aoll ^§6 


4 


'0.43133' 

3 


0.32350 

5 


0.53916 

6 


0.64700 


Site 1 
Site 2 
Site 3 
Site 4 


New Hub Weights 


The new hub vector hi can now be used to update the authority vector using Formula 4. The product j{ performs the weighting, and the normalization controls the size: 

Referencing Site 

12 3 4 


A T h\ « 


"o 

1 

1 

f 

’0.43133' 


'o' 


Y 


Y 


Y 


'1.50966' 

0 

0 

0 

1 

0.32350 

m 0.43133 

0 

+ 0.32350 

0 

+ 0.53916 

0 

+ 0.64700 

1 


0.64700 

1 

0 

0 

1 

0.53916 


1 

0 

0 

1 


1.07833 

1 

0 

1 

0 

0.64700 


1 


0 


1 


0 


0.97049 


Site 1 
Site 2 
Site 3 
Site 4 


a 41 r hi _1_ 

1 \\A%\\ ~ 2.19142 


1.50966 

0.64700 

1.07833 

0.97049 


0.68889 

0.29524 

0.49207 

0.44286 


Site 1 
Site 2 
Site 3 
Site 4 


New Authority Weights 


Once the updated hub and authority vectors, hi and ai, are obtained, the search engine repeats the process and computes a succession of hub and authority vectors, thereby 
generating the interrelated sequences 


1,1 idioF 


h2 ImF 


, Aai 
3_ Pa 2 || 


s 


1 


s 


/■ 


A 7 h, 

ao. a! =- 

1141% || 


a 2 


41% 

M%ll 


a 3 : 


A r h ? 

U%\\ 


n, —_ -^ a fc -1 _ 

* W*k-\W ’ 


A r h k 

*k = - 

\\A T h k \\ 


(5) 


( 6 ) 


However, each of these is a power sequence in disguise. For example, if we substitute the expression for h^ into the expression for a^, then we obtain 


a Jc r 



a t( 

A*k-1 ) 

i aT a\ 


- \ 

ll^-ill I 

^ A]*k-l 

IM%II 

M r ( 

A^k—\ ^li 

|| “ || j-4 7 jllja*- 1 || 

1141a*-! II J 1 


which means that we can rewrite 6 as 


Similarly, we can rewrite 5 as 


ag, ai = 


1 

(. A T A' 

| a 0 

II (41 % 

| a oll 


a 2 = 


1 


iai 

i 


l a k-l 

III 

(^%)ail| ’ 

’ * n(/%) 

i a fc-lll 


hi = 

1 Maoll 


4141+1 

h 2 = -)—=(-. 

II 4141 1 hill 


h* = 


1 

[aa t ) 

|hk-l 

ill 

K 

lh*-il| : 


(7) 


( 8 ) 


In Exercise 15 of Section 9.2 you were asked to show that A^A and AA^ both have positive dominant eigenvalues. That being the case, Theorem 9.2.1 ensures that 7 
and 8 converge to the dominant eigenvectors of A Y and AA ^ respectively. The entries in those eigenvectors are the authority and hub weights that Google uses to rank the 
search sites in order of importance as hubs and authorities. 


EXAMPLE 3 A Ranking Procedure 


Suppose that a search engine produces 10 Internet sites in its search set and that the adjacency matrix for those sites is 





























































Referenced Site 
123456789 10 


A = 


1 1 
0 0 


0 0 
0 0 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


Referencing Site 


Use Formula 7 to rank the sites in decreasing order of authority. 


Solu We will take ag to be the normalized vector of column sums of A, and then we will compute the iterates in 7 until the authority vectors seem to 
stabilize. We leave it for you to show that 


and that 


(A t A) a 0 f 


a 0 = 


1 

{54 


0 


0 

2 


0.27217 

1 


0.13608 

1 


0.13608 

5 


0.68041 

3 


0.40825 

1 


0.13608 

3 


0.40825 

0 


0 

2 


0.27217 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

2 

1 

1 

2 

0 

0 

2 

0 

1 

0.27217 


3.26599 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0.13608 


1.90516 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0.13608 


1.90516 

0 

2 

1 

1 

5 

0 

0 

2 

0 

1 

0.68041 


5.30723 

0 

0 

0 

0 

0 

3 

1 

0 

0 

0 

0.40825 


1.36083 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0.13608 


0.54433 

0 

2 

1 

1 

2 

0 

0 

3 

0 

1 

0.40825 


3.67423 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

1 

1 

1 

1 

0 

0 

1 

0 

2 

0.27217 


2.17732 


Thus, 


a l = 


1 


iao 

1 

II [a t a] 

laoll 

“ 8.15362 


0 


0 

3.26599 


0.40056 

1.90516 


0.23366 

1.90516 


0.23366 

5.30723 


0.65090 

1.36083 


0.16690 

0.54433 


0.06676 

3.67423 


0.45063 

0 


0 

2.17732 


0.26704 


Continuing in this way yields the following authority iterates: 


a 0 


1 


lag | 


iai | 

•a t a] 

l a 2 1 

[a t a' 

|a 3 

11 (^) 

laoll ll(^)aill ll| 

[a t a] 

M 34 ii| 

<a t a) 

| a 3 ll 


a 9 = 


1 

[A r A] 

la 8 1 

[A’A] 

| a 9 

"I 

!^) 

a io — 

lasll ll| 


Hi 


0 


0.27217 


0.13608 


0.13608 


0.68041 


0.40825 


0.13608 


0.40825 


0 


0.27217 



0 

0.40056 
0.23366 
0.23366 
0.65090 
0.16690 
0.06676 
0.45063 
0 

0.26704 


0 

0.41652 

0.24917 

0.24917 

0.63407 

0.06322 

0.02603 

0.46672 

0 

0.27892 


0 

0.41918 

0.25233 

0.25233 

0.62836 

0.02372 

0.00981 

0.47050 

0 

0.28300 


0 

0.41973 

0.25309 

0.25309 

0.62665 

0.00889 

0.00368 

0.47137 

0 

0.28416 


0 

0.41990 

0.25337 

0.25337 

0.62597 

0.00007 

0.00003 

0.47165 

0 

0.28460 


0 

0.41990 

0.25337 

0.25337 

0.62597 

0.00002 

0.00001 

0.47165 

0 

0.28460 


Site 1 
Site 2 
Site 3 
Site 4 
Site 5 
Site 6 
Site 7 
Site 8 
Site 9 
Site 10 


The small changes between a 9 and ajg suggest that the iterates have stabilized near a dominant eigenvector of ^ 4 . From the entries in ajg we conclude that Sites 
1, 6, 7, and 9 are probably irrelevant to the search and that the remaining sites should be searched in order of decreasing importance as 








































Site 5, Site 8, Site 2, Site 10, Site 3 and 4 (a tie) 


Concept Review 

Adjacency matrix 
Hub vector 
Authority vector 
Hub weights 
Authority weights 
Skills 

Find the initial hub and authority vectors of an adjacency matrix. 
Use the method of Example 3 to rank sites. 


Exercise Set 9.3 


In Exercises 1-2, find the initial hub and authority vectors for the given adjacency matrix A. 

1 . 


Referenced Site 
1 2 3 


A = 


0 0 1 
1 0 1 
1 0 1 


Referencing Site 


Answer: 



T 


"2" 

ho = 

2 

> a 0 = 

0 


2 


3 


Referenced Site 
12 3 4 


0 10 1 
10 0 1 
10 0 1 
1110 


1 

2 Referencing Site 

3 

4 


In Exercises 3-4, find the updated hub and authority vectors hi and ai for the adjacency matrix A. 
3. The matrix in Exercise 1. 

Answer: 



'0.39057" 


"0.60971' 


0.65094 

0.65094 

, ai« 

0 

0.79262 


4. The matrix in Exercise 2. 

In Exercises 5-8, the adjacency matrix A of an Internet search engine is given. Use the method of Example 3 to rank the sites in decreasing order of authority. 

5. Referenced Site 

12 3 4 


0 0 10 
10 0 0 
110 0 
0 10 0 


1 

2 Referencing Site 

3 

4 


Answer: 


Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 

6. Referenced Site 

12 3 4 


A = 


0 110 
0 0 10 
10 0 1 
10 0 0 


1 

2 Referencing Site 

3 

4 


















7 . Referenced Site 

1 2 3 4 5 


A = 


0 1110 1 

l ° ° ° J I Referencing Site 

0 0 0 0 1 3 

0 1 0 0 0 4 

0 110 0 5 


Answer: 


Site 2, site 3, site 4; sites 1 and 5 are irrelevant 

8. Referenced Site 


A = 


123456789 10 


0 110 
0 0 10 
0 0 0 0 
0 110 
0 0 0 1 
0 10 0 
0 0 0 0 
0 0 0 0 
0 110 
0 0 0 0 


110 0 
0 0 0 0 
0 0 0 0 
0 110 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 10 0 
0 10 1 
0 10 0 


0 1 1 

0 0 2 

0 1 3 

0 1 4 

0 0 5 

0 0 6 

1 0 7 

0 0 8 

0 1 9 

0 0 10 


Referencing Site 
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9.4 Comparison of Procedures for Solving Linear 
Systems 

There is an old saying that “time is money.” This is especially true in industry where the cost of solving a linear 
system is generally determined by the time it takes for a computer to perform the required computations. This 
typically depends both on the speed of the computer processor and on the number of operations required by the 
algorithm. Thus, choosing the right algorithm has important financial implication in an industrial or research setting. 
In this section we will discuss some of the factors that affect the choice of algorithms for solving large-scale linear 
systems. 


Flops and the Cost of Solving a Linear System 

In computer jargon, an arithmetic operation 4 =) on two real numbers is called a flop , which is an acronym 

for “floating-point operation.” The total number of flops required to solve a problem, which is called the cost of the 
solution, provides a convenient way of choosing between various algorithms for solving the problem. When needed, 
the cost in flops can be converted to units of time or money if the speed of the computer processor and the financial 
aspects of its operation are known. For example, many of today’s personal computers are capable of performing in 
excess of 10 gigaflops per second (1 gigaflop =10^ flops). Thus, an algorithm that costs 1,000,000 flops would be 
executed in 0.0001 seconds. 

To illustrate how costs (in flops) can be computed, let us count the number of flops required to solve a linear system 
of n equations in n unknowns by Gauss-Jordan elimination. For this purpose we will need the following formulas for 
the sum of the first n positive integers and the sum of the squares of the first n positive integers: 

1+2 + 3 + ... + * = ^2±JL (1) 

1 2 + 2 2 + 3 2 + ...+» 2 = m (”+ 1 )( 2 « + 1 ) (2) 

0 


Let Jix = b be a linear system of n equations in n unknowns to be solved by Gauss-Jordan elimination (or, 
equivalently, by Gaussian elimination with back substitution). For simplicity, let us assume that A is invertible and 
that no row interchanges are required to reduce the augmented matrix [^4|b ] to row echelon form. The diagrams that 
accompany the following analysis provide a convenient way of counting the operations required to introduce a 
leading 1 in the first row and then zeros below it. In our operation counts, we will lump divisions and multiplications 
together as “multiplications,” and we will lump additions and subtractions together as “additions.” 

It requires n flops (multiplications) to introduce the leading 1 in the first row. 


1 x x • ■ ■ X X 

X 



• • • • • • • • 

• 


x denotes aquantity that is being computed. 

• • • • • 

• 


• denotes a quantity that is not being computed. 

• • • • • • • • 

• 


The augmented matrix size is n x {n + 1) . 

• • • • • 

• 











It requires n multiplications and n additions to introduce a zero below the leading 1, and there are n — 1 rows 
below the leading 1, so the number of flops required to introduce zeros below the leading 1 is 2n{n — 1). 

1 • • 

Ox x • • ■ 

Ox x 

Ox x 
Ox x 

Combining Steps 1 and 2, the number of flops required for column 1 is 

n + 2n (n — 1) = 2 « 2 — n 

The procedure for column 2 is the same as for column 1, except that now we are 
dealing with one less row and one less column. Thus, the number of flops 
required to introduce the leading 1 in row 2 and the zeros below it can be obtained 
by replacing n by ^ _ ] in the flop count for the first column. Thus, the number of 
flops required for column 2 is 

2 («—l) 2 — («-l) 

By the argument for column 2, the number of flops required for column 3 is 

2 (n — 2) 2 — — 2 J 

The pattern should now be clear. The total number of flops required to create the n 
leading l’s and the associated zeros is 

(2 * 2 —«) + [2(« —l) 2 — l)] + [2(*-2) 2 - (* — 2)]+...+ (2 -l) 

which we can rewrite as 

2|^ 2 + (« — l) 2 +... + 1J — — 1J + - -- + lj 

or on applying Formulas 1 and 2 as 

? «(« + 1 )( 2 « + 1 ) _ «(« + !) _ 2 3 . 1 2 _ 1 
6 2 “3 + 2 6 

Next, let us count the number of operations required to complete the backward 
phase (the back substitution). 

It requires n — \ multiplications and « — 1 additions to introduce zeros above the 
leading 1 in the nth column, so the total number of flops required for the column 
is 2(« — 1). 

!••••• 

0 1 • 

0 0 1 

0 0 0 

0 0 0 


1 

0 











The procedure is the same as for Step 1, except that now we are dealing with one 
less row. Thus, the number of flops required for the (n — 1) st column is 2{n — 2) 


1 • • •••00 

0 1 « • • • 0 0 x 

0 0 1 • • • 0 0 x 

. . . . ■ • 

: : : : : 

0 0 0 • • • 10 * 

0 0 0 •••01 


By the argument for column {n — 1), the number of flops required for column 
{n — 2) is 2{n — 3). 

The pattern should now be clear. The total number of flops to complete the 
backward phase is 

2^3 — lJ + 2^3 — 2j + 2^ — 3j + ... + 2^ — «J = 2^ 2 — ^1 + 2 + ... + «JJ 


which we can rewrite using Formula 1 as 


n>( 2 «(» + 1 ) ^ 2 

2 [ n - ——\ = n —n 


In summary, we have shown that for Gauss-Jordan elimination the number of flops required for the forward and 
backward phases is 


2 3 1 2 1 

flops for forward phase = —n + — n — —n 

5 Z o 


flops for backward phase — n-n 


Thus, the total cost of solving a linear system by Gauss-Jordan elimination is 


2 3 3 2 7 

flops for both phases = —n + — n — —n 
5 Z o 


(3) 

(4) 

( 5 ) 


Cost Estimates for Solving Large Linear Systems 


It is a property of polynomials that for large values of the independent variable the term of highest power makes the 
major contribution to the value of the polynomial. Thus, for large linear systems we can use 3 and 4 to approximate 
the number of flops in the forward and backward phases as 


flops for forward phase 



( 6 ) 


2 

flops for backward phase « n 


( 7 ) 


This shows that it is more costly to execute the forward phase than the backward phase for large linear systems. 






Indeed, the cost difference between the forward and backward phases can be enormous, as the next example shows. 


EXAMPLE 1 Cost of Solving a Large Linear System 

Approximate the time required to execute the forward and backward phases of Gauss-Jordan 
elimination for a system of 10,000 ( = JO 4 ) equations in 10,000 unknowns using a computer that can 
execute 10 gigaflops per second. 

We have n — ]0 4 for the given system, so from 6 and 7 the number of gigaflops required 
for the forward and backward phases is 

gigaflops for forward phase cs xl 0 -S ' = y^l 0 4 J xl 0 -9 = jxl 0 3 

gigaflops for backward phase x 10 -9 = ^10 4 J x 10 -9 = 10 
Thus, at 10 gigaflops/s the execution times for the forward and backward phases are 

10 -1 s Ps 66.67 s 

10" 1 swO.Ol s 


time for forward phase ^ ^ x 10 3 j x 
time for backward phase « [10 ~ M x 


We leave it as an exercise for you to confirm the results in Table 1. 


Table 1 


Approximate Cost for an ^ x n Matrix A with Large n 

Algorithm Cost in Flops 

Gauss-Jordan elimination (forward phase) 

«-|« 3 

Gauss-Jordan elimination (backward phase) 

** 2 

Zt/-decomposition of A 

ss 

Forward substitution to solve Ly = b 

S3* 2 

Backward substitution to solve Z7x = y 

S3* 2 

^-1 by reducing [A\l] to 1 

ss 2« 3 

Compute A~^b 

2 « 3 


Considerations in Choosing an Algorithm for Solving a Linear System 

For a single linear system Ax = b of n equations in n unknowns, the methods of Zt/-decomposition and Gauss- 
Jordan elimination differ in bookkeeping but otherwise involve the same number of flops. Thus, neither method has 
a cost advantage over the other. However, Zt/-decomposition has other advantages that make it the method of 
choice: 
















Gauss-Jordan elimination and Gaussian elimination both use the augmented matrix [^4|b ], so b must be known. 
In contrast, ZU-decomposition uses only the matrix A, so once that decomposition is known it can be used with as 
many right-hand sides as are required, one at a time. 

The ZU-decomposition that is computed to solve Ax = b can be used to compute A if needed, with little 

additional work. 

For large linear systems in which computer memory is at a premium, one can dispense with the storage of the l's 
and zeros that appear on or below the main diagonal of £/, since those entries are known from the form of U. The 
space that this opens up can then be used to store the entries of Z, thereby reducing the amount of memory 
required to solve the system. 

If A is a large matrix consisting mostly of zeros, and if the nonzero entries are concentrated in a “band” around the 
main diagonal, then there are techniques that can be used to reduce the cost of ZU-decomposition, giving it an 
advantage over Gauss-Jordan elimination. 

The cost in flops for Gaussian elimination is the 
same as that for the forward phase of Gauss- 
Jordan elimination. 


Concept Review 

Flop 

Formula for the sum of the first n positive integers 
Formula for the sum of the squares of the first n positive integers 
Cost in flops for solving large linear systems by various methods 
Cost in flops for inverting a matrix by row reduction 

Issues to consider when choosing an algorithm to solve a large linear system 

Skills 

Compute the cost of solving a linear system by Gauss-Jordan elimination. 

Approximate the time required to execute the forward and backward phases of Gauss-Jordan elimination. 
Approximate the time required to find an Z U-decomposition of a matrix. 

Approximate the time required to find the inverse of an invertible matrix. 


Exercise Set 9.4 

1. A certain computer can execute 10 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss-Jordan elimination. 

(a) A system of 1000 equations in 1000 unknowns. 

(b) A system of 10,000 equations in 10,000 unknowns. 

(c) A system of 100,000 equations in 100,000 unknowns. 


Answer: 


(a) ps 0.067 second 

(b) ps 66.68 seconds 

(c) fsi 66 , 668 seconds, or about 18.5 hours 

2. A certain computer can execute 100 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss-Jordan elimination. 

(a) A system of 10,000 equations in 10,000 unknowns. 

(b) A system of 100,000 equations in 100,000 unknowns. 

(c) A system of 1,000,000 equations in 1,000,000 unknowns. 

3. Today's personal computers can execute 70 gigaflops per second. Use Table 1 to estimate the time required to 
perform the following operations on the invertible 10,000 x 10,000 matrix A. 

(a) Execute the forward phase of Gauss-Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimination. 

(c) LU"decomposition of A. 

(d) Find A -1 by reducing [A\l] to l A 1 J. 

Answer: 

(a) « 9.52 seconds 

(b) « 0.0014 second 

( c ) « 9.52 seconds 

(d) « 28.6 seconds 

4. The IBM Roadrunner computer can operate at speeds in excess of 1 petaflop per second (1 petaflop =10^ 

flops). Use Table 1 to estimate the time required to perform the following operations of the invertible 
100, 000 x 100, 000 matrix A. 

(a) Execute the forward phase of Gauss-Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimination. 

(c) £ ^/-decomposition of A. 

(A) Find A by reducing [A\I] to 1A J. 

(a) Approximate the time required to execute the forward phase of Gauss-Jordan elimination for a system of 
100,000 equations in 100,000 unknowns using a computer that can execute 1 gigaflop per second. Do the 
same for the backward phase. (See Table 1.) 

(b) How many gigaflops per second must a computer be able to execute to find the £ ^/-decomposition of a 
matrix of size 10,000 x 10,000 in less than 0.5 s? (See Table 1.) 

Answer: 

( a ) 6.67 x 10 5 s for forward phase, 10 s for backward phase 

(b) 1334 



6 . About how many teraflops per second must a computer be able to execute to find the inverse of a matrix of size 
100, 000 x 100, 000 in less than 0.5 s? (1 teraflop = 10 lz flops.) 

In Exercises 7-10, A and B are n x n matrices and c is a real number. 

7. How many flops are required to compute cA 
Answer: 

n 2 flops 

8 . How many flops are required to compute A | 5? 

9. How many flops are required to compute AB ? 

Answer: 

2r? — r? flops 

10. If A is a diagonal matrix and k is a positive integer, how many flops are required to compute A k ^ 
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9.5 Singular Value Decomposition 

In this section we will discuss an extension of the diagonalization theory for n x n symmetric matrices to general 
mxn matrices. The results that we will develop in this section have applications to compression, storage, and 
transmission of digitized information and form the basis for many of the best computational algorithms that are 
currently available for solving linear systems. 


Decompositions of Square Matrices 

We saw in Formula 2 of Section 7.2 that every symmetric matrix A can be expressed as 

A = PDP r (1) 

where P is an n x n orthogonal matrix of eigenvectors of A, and D is the diagonal matrix whose diagonal entries are 
the eigenvalues corresponding to the column vectors of P. In this section we will call 1 an eigenvalue 
decomposition of A (abbreviated EVD of A). 

If an ft x n matrix A is not symmetric, then it does not have an eigenvalue decomposition, but it does have a 
Hessenberg decomposition 

A = PHP r 

in which P is an orthogonal matrix and H is in upper Hessenberg form (Theorem 7.2.4). 

Moreover, if A has real eigenvalues, then it has a Schur decomposition 

A = PSP r 

in which P is an orthogonal matrix and S is upper triangular (Theorem 7.2.3). 

The eigenvalue, Hessenberg, and Schur decompositions are important in numerical algorithms not only because the 
matrices D , //, and S have simpler forms than A, but also because the orthogonal matrices that appear in these 
factorizations do not magnify roundoff error. To see why this is so, suppose that x is a column vector whose entries 
are known exactly and that 

A 

x = x + e 

is the vector that results when roundoff error is present in the entries of x 

If P is an orthogonal matrix, then the length-preserving property of orthogonal transformations implies that 

|| J Px-/4'|| = ||x-x|| = ||e|| 

which tells us that the error in approximating pQ by Px has the same magnitude as the error in approximating x by 

x 

There are two main paths that one might follow in looking for other kinds of decompositions of a general square 
matrix A: One might look for decompositions of the form 

A = PJP~ l 

in which P is invertible but not necessarily orthogonal, or one might look for decompositions of the form 

a=uw t 


in which U and V are orthogonal but not necessarily the same. The first path leads to decompositions in which J is 
either diagonal or a certain kind of block diagonal matrix, called a Jordan canonical form in honor of the French 
mathematician Camille Jordan (see p. 510). Jordan canonical forms, which we will not consider in this text, are 
important theoretically and in certain applications, but they are of lesser importance numerically because of the 
roundoff difficulties that result from the lack of orthogonality in P. In this section we will focus on the second path. 


Singular Values 

Since matrix products of the form A ^A will play an important role in our work, we will begin with two basic 
theorems about them. 


THEOREM 9.5.1 

If A is an m x n matrix, then: 

(a) A and A ^A have the same null space. 

(b) A and A T A have the same row space. 

(c) A ^ an d A ^A have the same column space. 

(d) A and A u4 have the same rank. 


We will prove part ( a ) and leave the remaining proofs for the exercises. 

We must show that every solution of Ax = 0 is a solution of A ^ Ax = 0? and conversely. If xq is any 
solution of Ax. — 0 ? then xq is also a solution of A T Ax = 0 since 

A t Ax 0 =A T {Ax 0 ) j = A T Q = 0 

Conversely, if xq is any solution of A ^ Ax = 0 ? then xq is in the null space of A ^A and hence is orthogonal to all 
vectors in the row space of A U4 hy part (< q ) of Theorem 4.8.10. 

However, ^ 4^4 is symmetric, so xq is also orthogonal to every vector in the column space of A T A In particular, xq 
must be orthogonal to the vector [ A 1 A [xq; that is, 

xo- (d^Jxg = 0 

Using the first formula in Table 1 of Section 3.2 and properties of the transpose operation we can rewrite this as 

Xq ^^jxo = (-dxo) r (4xo) = (4*0J • (AkoJ = 11 ^X 0 II 2 = 0 

which implies that Axq = 0 , thereby proving that xq is a solution of Axq = 0. 


THEOREM 9.5.2 


If A is an m x n matrix, then: 

(a) A T A is orthogonally diagonalizable. 

(b) The eigenvalues of A are nonnegative. 


The matrix A ^A> being symmetric, is orthogonally diagonalizable by Theorem 7.2.1. 

Proof (b) Since A T A is orthogonally diagonalizable, there is an orthonormal basis for R }} consisting of 
eigenvectors of A^A> sa Y ( v l> v 2 » ---> v M ) . If we let A 2 ,A M be the corresponding eigenvalues, then for 
1 < i < n we have 


||Av 2 || 2 = Avj - Av i = vj - A T Avi [Formula (26) of Section 3.21 

= V| A|V, = A, (t, *,) = A,|| v, II 2 = A, 

It follows from this relationship that A 2 > 0. 


1 


DEFINITION 1 


If A is an m x n matrix, and if Aj, A 2 , , A M are the eigenvalues of A then the numbers 

£71 = <72 = /A 2 , —, O n = 


are called the singular values of A. 


We will assume throughout this section that the 
eigenvalues of A ^A are named so that 

Ai > A 2 >...> A M > 0 

and hence that 

0*1 >&2 ^ 0 

EXAMPLE 1 Singular Values 

Find the singular values of the matrix 

"1 1 
0 1 
1 0 


The first step is to find the eigenvalues of the matrix 


'1 0 f 

'1 

r 


'2 r 

0 

1 

= 

_1 1 0. 

1 


1 2 


0 




a t a= 











The characteristic polynomial of A T A is 

A 2 _4A + 3= (A-3)(A-l) 

so the eigenvalues of A T A are Aj = 3 and A 2 = 1 and the singular values of A in order of decreasing 
size are 

(T\ = f\\ = {?>, a 2 = {\2 = 1 


Singular Value Decomposition 


Before turning to the main result in this section, we will find it useful to extend the notion of a “main diagonal” to 
matrices that are not square. We define the main diagonal of an m x n matrix to be the line of entries shown in 
Figure 9.5.1—it starts at the upper left comer and extends diagonally as far as it can go. We will refer to the entries 
on the main diagonal as the diagonal entries. 


X X X X X X X 
X X X X X X X 
X X X X X X X 
X X X X X X X 


X X X X 
X X X X 
X X X X 
X X X X 
X X X X 
X X X X 
X X X X 


Main diagonal 


Figure 9.5.1 

We are now ready to consider the main result in this section, which is concerned with a specific way of factoring a 
general m'Kn matrix A. This factorization, called singular value decomposition (abbreviated SVD) will be given in 
two forms, a brief form that captures the main idea, and an expanded form that spells out the details. The proof is 
given at the end of this section. 


Singular Value Decomposition 

If A is an m x n matrix, then A can be expressed in the form 

A=lTLV r 

where U and V are orthogonal matrices and £ is an m x n matrix whose diagonal entries are the singular 
values of A and whose other entries are zero. 








Harry Bateman (1882-1946) 


The term singular value is apparently due to the British-born mathematician Harry 
Bateman, who used it in a research paper published in 1908. Bateman emigrated to the United States in 
1910, teaching at Bryn Mawr College, Johns Hopkins University, and finally at the California Institute of 
Technology. Interestingly, he was awarded his Ph.D. in 1913 by Johns Hopkins at which point in time he 
was already an eminent mathematician with 60 publications to his name. 

[Image'. Courtesy of the Archives, California Institute of Technology] 


Singular Value Decomposition (Expanded Form) 

If A is an m x n matrix of rank k, then A can be factored as 


A=ULV i = [ ui u 2 






1 

0 

0 


0 ... 

■ i 8 < 

OJ 

b ... 

0 ... 

0ftx(w-fc) 

0 0 ■ • " < 7 fc 

fc)x(«— k) 

0(m-fc)xfc 



T 

V 1 

T 

v 2 


T 

V k 

T 

v fc+l 


in which £/, £, and V have sizes m x m, mxn, and«x«, respectively, and in which 

(a) V = [v l V 2 ... v„ ] orthogonally diagonalizes A T A- 

(b) The nonzero diagonal entries of £ are a\ = ^aJ", a 2 = ^ 2 ,= ^A^, where Aj, A 2 ,Afc are the 
nonzero eigenvalues of A T A corresponding to the column vectors of V. 

(c) The column vectors of V are ordered so that a\ >0‘2^.--^.°‘k > ® 

Wui= iiffr = ^ Avi h 1 ' 2 .*) 

(e) {uj, U 2 ,u^} is an orthonormal basis for col(A)}. 

(f) (m, U 2 , \i m ) is an extension of (uj, U 2 ,u^} to an ortho-normal basis for R m . 











The vectors ui, U2,are called the left 
singular vectors of A, and the vectors 
vi, V2,Vfc are called the right singular vectors 
of A. 


EXAMPLE 2 Singular Value Decomposition if A Is Not Square 

Find a singular value decomposition of the matrix 

A = 


1 1 
0 1 
1 0 


We showed in Example 1 that the eigenvalues of A ^A are Aj = 3 and A2 = 1 and that the 
corresponding singular values of A are a\ = y^3 and a2 = 1. We leave it for you to verify that 


vi = 


[£1 


£] 

2 

a 

and V2 = 

2 

Ji 

2 


2 


are eigenvectors corresponding to Aj and A2, respectively, and that V = [vi |V2] orthogonally 
diagonalizes A ^A- From part (d) of Theorem 9.5.4, the vectors 


ui = 


a\ 3 


'1 f 
0 1 

\£] 

2 

1 0 

£ 


2 


u 2 = 


^v 2 =(l) 


'l f 

ill 

0 1 

1 0 

2 

Ji 


2 


£ 

3 

£ 

6 

£ 

6 


0 

Ji 

2 

£ 

2 


are two of the three column vectors of U. Note that and 112 are orthonormal, as expected. We could 
extend the set (uj, 112} to an orthonormal basis for pf. However, the computations will be easier if 
we first remove the messy radicals by multiplying and 112 by appropriate scalars. Thus, we will look 
for a unit vector U3 that is orthogonal to 


/6ui = 


and /2u2 = 


0 

-1 

1 


To satisfy these two orthogonality conditions, the vector 113 must be a solution of the homogeneous 
























linear system 




"*f 



'2 

1 f 

_ 

'O' 

*2 

0 

-1 1 


0 



_* 3 _ 




We leave it for you to show that a general solution of this system is 


'*l' 


'-r 

*2 

= t 

i 

*3 


i 


Normalizing the vector on the right yields 


u 3 = 


& 

i 

& 

i 

& 


Thus, the singular value decomposition of A is 


1 1 
0 1 
1 0 


£ 

o _L_ 

3 


i? 


6 

2 

£ 

J2 J_ 

6 

2 ji 


/3 0 

0 1 
0 0 




A 


U £ V T 


You may want to confirm the validity of this equation by multiplying out the matrices on the right side. 



Eugenio Beltrami (1835-1900) 
























Camille Jordan (1838-1922) 



Herman Klaus Weyl (1885-1955) 



Gene H. Golub (1932-) 


The theory of singular value decompositions can be traced back to the work of five 
people: the Italian mathematician Eugenio Beltrami, the French mathematician Camille Jordan, the English 
mathematician James Sylvester (see p. 34), and the German mathematicians Erhard Schmidt (see p. 360) 
and the mathematician Herman Weyl. More recently, the pioneering efforts of the American mathematician 
Gene Golub produced a stable and efficient algorithm for computing it. Beltrami and Jordan were the 
progenitors of the decomposition—Beltrami gave a proof of the result for real, invertible matrices with 
distinct singular values in 1873. Subsequently, Jordan refined the theory and eliminated the unnecessary 
restrictions imposed by Beltrami. Sylvester, apparently unfamiliar with the work of Beltrami and Jordan, 
rediscovered the result in 1889 and suggested its importance. Schmidt was the first person to show that the 
singular value decomposition could be used to approximate a matrix by another matrix with lower rank, 
and, in so doing, he transformed it from a mathematical curiosity to an important practical tool. Weyl 
showed how to find the lower rank approximations in the presence of error. 

[Images: wikipedia ( Beltrami ); The Granger Collection , New York (Jordan)] Courtesy Electronic Publishing 
Services, Inc., New York City (Weyl] wikipedia (Golub)] 





OPTIONAL 


We conclude this section with an optional proof of Theorem 9.5.4. 


For notational simplicity we will prove this theorem in the case where A is an n x n 
matrix. To modify the argument for an^ x « matrix you need only make the notational adjustments required to 
account for the possibility that m > n or n > m . 


The matrix A^A is symmetric, so it has an eigenvalue decomposition 

a t a = vdv t 

in which the column vectors of 


V = [vi|v 2 |...|v„] 

are unit eigenvectors of an <3 T) is a diagonal matrix whose successive diagonal entries Aj, A 2 ,A„ are the 
eigenvalues of A ^A corresponding in succession to the column vectors of y . Since A is assumed to have rank k, it 
follows from Theorem 9.5.1 that A ^A a l so has rank k. It follows as well that D has rank k, since it is similar to A J A 
and rank is a similarity invariant. Thus, D can be expressed in the form 


Ai 


^2 


0 


D = 



( 2 ) 


0 0 

where Aj > A 2 >... > Afc > 0. Now let us consider the set of image vectors 


{Av\,Av2,---,Av n ) 


(3) 


This is an orthogonal set, for if i * j, then the orthogonality of v 2 and v ; implies that 

Avj • Avj = Vj - A T Avj = Vj * XjVj = \j - vj J = 0 

Moreover, the first k vectors in 3 are nonzero since we showed in the proof of Theorem 9.5.2 b that \\Av, || 2 = Ay f° r 
i = 1, 2, n, and we have assumed that the first k diagonal entries in 2 are positive. Thus, 

S= {Av\, Av 2, Avk) 


is an orthogonal set of nonzero vectors in the column space of A. But the column space of A has dimension k since 

rank (A J = rank (A J Aj = k 


and hence 5, being a linearly independent set of k vectors, must be an orthogonal basis for col(A). If we now 
normalize the vectors in S , we will obtain an orthonormal basis {uj, u 2 ,u^} for col(A) in which 


u, = 


Av 


i _—. 


M»<l ft 


Avj 


l<i<k\ 


or, equivalently, in which 





Av\ = /x^ui = aiui, Av 2 = /A 2 U 2 = a 2 u 2 ,..= fau k = er k u k 


(4) 


It follows from Theorem 6.3.6 that we can extend this to an orthonormal basis 

(ui,u 2 .Ufc, Uft+1,..., U M } 

for R n . Now let U be the orthogonal matrix 

U = [ U 1 U2 — Ufc+1 — u«] 

and let £ be the diagonal matrix 


£ = 


*1 






It follows from 4, and the fact that Av l = 0 forj > £, that 

UL = [a\u\ a2U2 ... ajtUh 0 

= [-4vi Av 2 ... Avfr Avjt+i 

= AV 

which we can rewrite using the orthogonality of V as a _ JJYJV ^• 


0 ] 

Av n \ 


Concept Review 

Eigenvalue decomposition 
Hessenberg decomposition 
Schur decomposition 
Magnification of roundoff error 
Properties that A and A ^A have in common 
A T A is orthogonally diagonalizable 
Eigenvalues of A T A are nonnegative 
Singular values 

Diagonal entries of a matrix that is not square 
Singular value decomposition 

Skills 

Find the singular values of an mxn matrix. 

Find a singular value decomposition of an m x n matrix. 


Exercise Set 9.5 




In Exercises 1-4, find the distinct singular values of A . 

1 .- 4 = [ 1 2 0 ] 


Answer: 


0 , {5 



Answer: 





In Exercises 5-12, find a singular value decomposition of A. 



Answer: 


1 _ l_ 

{2 {2 

^11 

{2 {2 


ft 0 [1 o' 

0 /2 L° 1 _ 


Answer: 

2 _ li r j_ _ 2 _~ 

{I r 8 0] 

j_ _2_ L° 2 J_2_ j_ 

1 / 5 J ft 



Answer: 



A = 


2 J_ 

1 ft 

3 ° 

2 _L 

-3 f2 


ft. 

6 

2/2 

3 

Ji 


3/2 0 
0 0 


{2 {2 
1 1 

^2 /2 


10 . 




11 . 




-2 -1 

2 1 

1 0 
1 1 

-1 1 


2 

-2 


Answer: 


A = 


-J= 0 


Jl 

ft 

1 J_ 

/3 ft 

1 J_ 

/3 |/2 


2 . 

ft 


{l 0 
0 {2 


1 0 
0 1 


12 . 


A = 


6 4 
0 0 
4 0 


13. Prove: If A is an ^ x n matrix, then A ^A and AA ^ have the same rank. 

14. Prove part ( d) of Theorem 9.5.1 by using part (< a ) of the theorem and the fact that A and A T A have n columns. 

( a ) Prove part ( b ) of Theorem 9.5.1 by first showing that row^^J is a subspace of row(A). 

(b) Prove part (c) of Theorem 9.5.1 by using part ( b ). 

16. Let T:R n —> be a linear transformation whose standard matrix A has the singular value decomposition 

A = UYV and let B = {vj, V 2 ,..v M } and 5^ = |ui, 112 , .. n m J> be the column vectors of V and U, 
respectively. Show that 51 = [T]b\B 

17. Show that the singular values of A ^A are the squares of the singular values of A . 

18. Show that if a = U V V ^ a singular value decomposition of A , then U orthogonally diagonalizes AA 

True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer, 
(a) If A is an m x n matrix, then A T A is an m x m matrix 


Answer: 




















False 

(b) If A is an m x n matrix, then A^A is a symmetric matrix. 

Answer: 

True 

(c) If A is an m x n matrix, then the eigenvalues of A ^A are positive real numbers. 
Answer: 

False 

(d) If A is an n x n matrix, then A is orthogonally diagonalizable. 

Answer: 

False 

(e) If A is an m x n matrix, then A T A is orthogonally diagonalizable. 

Answer: 

True 

(i) The eigenvalues of A ^A are the singular values of A. 

Answer: 

False 

(g) Every m x n matrix has a singular value decomposition. 

Answer: 

True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



9.6 Data Compression Using Singular Value Decomposition 

Efficient transmission and storage of large quantities of digital data has become a major problem in our technological world. In this section 
we will discuss the role that singular value decomposition plays in compressing digital data so that it can be transmitted more rapidly and 
stored in less space. We assume here that you have read Section 9.5 . 


Reduced Singular Value Decomposition 


Algebraically, the zero rows and columns of the matrix £ in Theorem 9.5.4 are superfluous and can be eliminated by multiplying out the 
expression JJYV ^ using block multiplication and the partitioning shown in that formula. The products that involve zero blocks as factors 
drop out, leaving 


A= [ui 


u 2 ' ' -Ufc] 


CT\ 0 

0 <J 2 ■ 
0 0 ■ 



( 1 ) 


which is called a reduced singular value decomposition of A. In this text we will denote the matrices on the right side of 1 by U\, Ej, and 
r respectively, and we will write this equation as 


A=UiL l r( 


( 2 ) 


Note that the sizes of U \, , and l r - are m x it, k x k* and k xn* respectively, and that the matrix Lj is invertible, since its diagonal 

entries are positive. 


If we multiply out on the right side of 1 using the column-row rule, then we obtain 

j4 = ffiuivf +02^2*2 +--- + a k' 1 kvl (3) 

which is called a reduced singular value expansion of A. This result applies to all matrices, whereas the spectral decomposition [Formula 
7 of Section 7.2] applies only to symmetric matrices. 


It can be proved that an m x n matrix M has rank 1 if and only if it can be factored as M — uv ^, where u is a column vector in 
R m and V is a column vector in R n . Thus, a reduced singular value decomposition expresses a matrix A of rank k as a linear combination 
of k rank 1 matrices. 


EXAMPLE 1 Reduced Singular Value Decomposition 


Find a reduced singular value decomposition and a reduced singular value expansion of the matrix 


A = 


1 

0 

1 


1 

1 

0 


In Example 2 of Section 9.5 we found the singular value decomposition 








( 4 ) 


1 1 
0 1 
1 0 


A = 


R 

3 


1 

R 

R _R j_ 

6 2 ft 


\[s_ ^2 

6 2 

U 


1 


l/3 0 


j /2 _^2 

2 2 

j /2 yfc 

2 2 


V‘ 


Since A has rank 2 (verify), it follows from 1 with k = 2 that the reduced singular value decomposition of A corresponding 
to 4 is 


1 1 
0 1 
1 0 


& o 

{e \j~2 

6 2 

]/~6 2 

6 2 


/3 0 
0 1 


R 

2 


R 

2 


{2 {2 
2 2 


This yields the reduced singular value expansion 


1 1 
0 1 
1 0 


= 0 -iuivf + 0-2112V2 = {3 





’0 

3 

R 

\j2 2 

+ 0 ) 

2 

6 

R 

6 

_2 2 


R 

2 


= f3 


3 3 

£ £ 

6 6 

H £ 

6 6 


+ 0 ) 


0 0 

_1 1 

2 2 

1 _I 

2 2 


£2 _ {2 

2 2 


Note that the matrices in the expansion have rank 1, as expected. 


Data Compression and Image Processing 

Singular value decompositions can be used to “compress” visual information for the purpose of reducing its required storage space and 
speeding up its electronic transmission. The first step in compressing a visual image is to represent it as a numerical matrix from which the 
visual image can be recovered when needed. 

For example, a black and white photograph might be scanned as a rectangular array of pixels (points) and then stored as a matrix ^4 by 
assigning each pixel a numerical value in accordance with its gray level. If 256 different gray levels are used (0 = white to 255 = black), 
then the entries in the matrix would be integers between 0 and 255. The image can be recovered from the matrix ^4 by printing or 
displaying the pixels with their assigned gray levels. 


































Original Reconstruction 

In 1924 the U.S. Federal Bureau of Investigation (FBI) began collecting fingerprints and handprints and now 
has more than 30 million such prints in its files. To reduce the storage cost, the FBI began working with the Los Alamos National 
Laboratory, the National Bureau of Standards, and other groups in 1993 to devise rank based compression methods for storing 
prints in digital form. The following figure shows an original fingerprint and a reconstruction from digital data that was 
compressed at a ratio of 26:1. 


If the matrix A has size ^ x then one might store each of its y^n entries individually. An alternative procedure is to compute the reduced 
singular value decomposition 


J 4 = «riuivf + CT2U2V2 + • • • +<7kUfcVjT (5) 

in which cr\ > aj >... > er^, and store the t/s, the u 's, and the y's. 

When needed, the matrix A (and hence the image it represents) can be reconstructed from 5. Since each has m entries and each v ; has n 
entries, this method requires storage space for 

km+kn A~k = k{m A-n 4 = 1 ) 

numbers. Suppose, however, that the singular values o>_are sufficiently small that dropping the corresponding terms in 5 
produces an acceptable approximation 


A r = criuiVj' +CT2 U 2 V 2 + ‘ ’ • 4 - <T r u r vJ ( 6 ) 

to A and the image that it represents. We call 6 the rank r approximation of A. This matrix requires storage space for only 

rm+rn =F r = r(m + n + 1) 

numbers, compared to mn numbers required for entry-by-entry storage of A. For example, the rank 100 approximation of a 1000 x 1000 
matrix A requires storage for only 

100(1000 + 1000 4-1) = 200, 100 

numbers, compared to the 1,000,000 numbers required for entry-by-entry storage of A —a compression of almost 80%. 

Figure 9.6.1 shows some approximations of a digitized mandrill image obtained using 6. 



Rank 4 Rank 10 Rank 20 Rank 50 Rank 128 


Figure 9.6.1 








Concept Review 

Reduced singular value decomposition 
Reduced singular value expansion 
Rank of an approximation 

Skills 

Find the reduced singular value decomposition of an m x n matrix. 
Find the reduced singular value expansion of an m x n- 


Exercise Set 9.6 


In Exercises 1-4, find a reduced singular value decomposition of A. [Note: Each matrix appears in Exercise Set 9.5, where you 
asked to find its (unreduced) singular value decomposition.] 


1 . 


A = 


-2 2 

-1 1 

2 “2 


Answer: 


[3/i] 


J_L 

/2 {2 


II 

c4 

“2 

2 

-1 

1 

3. 

1 

0" 

A = 

1 

1 



1 

Answer: 



2 

1 -2 


_L 0 

J_L 

ft 

1 J_ 

’{3 /2 


A = 


0 

"1 0 " 

0 {2 

_0 1 _ 


In Exercises 5-8, find a reduced singular value expansion of A. 
5. The matrix A in Exercise 1. 


Answer: 




















3/2 


2 

3 

1 

3 

2 
3 


/2 /2 


1 1 


6. The matrix v4 in Exercise 2. 


7. The matrix ^4 in Exercise 3. 


Answer: 


1 


f3 


0 

1 



[1 0 ] + /2 [0 1 ] 


1 


1 


f2 




8. The matrix A in Exercise 4. 

9. Suppose ^4 is a 200 x 500 matrix. How many numbers must be stored in the rank 100 approximation of A? Compare this with the 
number of entries of A. 

Answer: 

70,100 numbers must be stored; A has 100,000 entries 

True-False Exercises 

In parts (a)—(c) determine whether the statement is true or false, and justify your answer. Assume that UjLifr'i is a reduced singular 
value decomposition of an m x n matrix of rank k. 

(a) U\ has size OTX <fc- 
Answer: 

True 

(b) £i has size kxk- 
Answer: 

True 

( c ) V t has size kxn- 
Answer: 

False 
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Supplementary Exercises 


i. 


Find an Z, [/-decomposition of A = 
Answer: 


-6 2 

6 0 


- 1 

C\J> 

0 

1 - 

CO 

1 

1_ 

-1 

Cs] 

1 

_ l 

0 2_ 


2. Find the LD [/-decomposition of the matrix^ in Exercise 1. 


3. 


Find an L {/-decomposition of A = 


Answer: 


2 4 6 
1 4 7 
1 3 7 


'2 

0 

O' 

'1 

2 

3' 

1 

2 

0 

0 

1 

2 

1 

1 

2 

0 

0 

1 


4. Find the LD [/-decomposition of the matrix A in Exercise 3. 


5. 


Let A = 


2 1 
1 2 


and xq = 


(a) Identify the dominant eigenvalue of A and then find the corresponding dominant unit eigenvector v 
with positive entries. 

(b) Apply the power method with Euclidean scaling to A and xq , stopping at xy Compare your value of 
X 5 to the eigenvector y found in part (a). 

(c) Apply the power method with maximum entry scaling to A and xq , stopping at xy Compare your 

T 


result with the eigenvector 


1 


Answer: 

(a) 


\= 3, v = 


f2 

f2 


(b) 


(c) 


x 5 


x 5 : 


0.7100 

0.7041 

1 

0.9918 


0.7071 

0.7071 


6 . Consider the symmetric matrix 




























Discuss the behavior of the power sequence 

XO, XI,--, x*.... 

with Euclidean scaling for a general nonzero vector xq What is it about the matrix that causes the 
observed behavior? 

7. Suppose that a symmetric matrix A has distinct eigenvalues Aj = 8 , A 2 = 1.4, A 3 = 2.3, and A 4 = —8.1. 
What can you say about the convergence of the Rayleigh quotients? 

8 Til 

' Find a singular value decomposition of A = | ^ ^ . 

1 r 

0 0. 

1 1 


9 . 

Find a singular value decomposition of A = 


Answer: 


0 

4 ^ 

1 

1_ 

1 

f2 

"2 o' 

1 

f2 

1 

f2 

0 1 

—7= 0 

0 

1 

0 0 

0 0 

1 

~f2 

1 

f2 


10. Find a reduced singular value decomposition and a reduced singular value expansion of the matrix A in 
Exercise 9. 


11. Find the reduced singular value decomposition of the matrix whose singular value decomposition is 


A = 


1 

2 

1 

2 

1 

2 

1 

2 


1 

2 

1 

2 

1 

2 

1 

2 


1 

2 

1 

2 

1 

2 

1 

2 


1 

2 

1 

2 

1 

2 

1 

2 


24 0 0 

0 12 0 

0 0 0 

0 0 0 



Answer: 


* M 

I 

OO O 

s <* 
_1 


1 1 

2 2 

'24 

0 ' 

'2 1 2' 

3 3 3 

4 -8 10 


1 1 

0 

12 

2 2 1 

l 

O 

CS1 

_1 


2 2 

1 1 

L. _l 

3 3 3 



12. Do orthogonally similar matrices have the same singular values? Justify your answer. 

13. If P is the standard matrix for the orthogonal projection of R n onto a subspace W\ what can you say about 
the singular values of PI 
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CHAPTER 


Applications of Linear 
Algebra 




CHAPTER CONTENTS 

Constructing Curves and Surfaces Through Specified Points 
Geometric Linear Programming 
The Earliest Applications of Linear Algebra 
Cubic Spline Interpolation 
Markov Chains 
Graph Theory 
Games of Strategy 
Leontief Economic Models 
Forest Management 
Computer Graphics 
Equilibrium Temperature Distributions 
Computed Tomography 
10.13. Fractals 
Chaos 

Cryptography 

Genetics 

Age-Specific Population Growth 
Harvesting of Animal Populations 
A Least Squares Model for Human Hearing 
Warps and Morphs 


INTRODUCTION 


This chapter consists of 20 applications of linear algebra. With one clearly marked 


exception, each application is in its own independent section, so sections can be deleted or 
permuted as desired. Each topic begins with a list of linear algebra prerequisites. 

Because our primary objective in this chapter is to present applications of linear algebra, 
proofs are often omitted. Whenever results from other fields are needed, they are stated 
precisely, with motivation where possible, but usually without proof. 
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10.1 Constructing Curves and Surfaces Through 
Specified Points 

In this section we describe a technique that uses determinants to construct lines, circles, and general conic 
sections through specified points in the plane. The procedure is also used to pass planes and spheres in 3-space 
through fixed points. 


Prerequisites 

Linear Systems 
Determinants 
Analytic Geometry 


The following theorem follows from Theorem 2.3.8. 


THEOREM 10.1.1 

A homogeneous linear system with as many equations as unknowns has a nontrivial solution if and only 
if the determinant of the coefficient matrix is zero. 


We will now show how this result can be used to determine equations of various curves and surfaces through 
specified points. 


A Line Through Two Points 

Suppose that (x\,y\) and (* 2 , y-i) are two distinct points in the plane. There exists a unique line 

c\x+C2y + C3 = 0 (1) 

that passes through these two points (Figure 10.1.1). Note that c\ 9 c 2 , and ^3 are not all zero and that these 
coefficients are unique only up to a multiplicative constant. Because (x\,y\) and (* 2? y 2 ) he on the line, 
substituting them in 1 gives the two equations 

c\x\ +C 2 T 1 +C 3 = 0 ( 2 ) 


C1X2 + C2)>2 + C 3 = 0 


( 3 ) 






X 


-► 


Figure 10.1.1 


The three equations, 1, 2, and 3, can be grouped together and rewritten as 

xci+yc2 + C2 = 0 

x\c\ +y\C2 + c 2 = o 

X 2 C \ + > , 2 C 2 + c 2 = 0 


which is a homogeneous linear system of three equations for ci, c 2 , and c 3 . Because ci, C 2 , and C 3 are not all 
zero, this system has a nontrivial solution, so the determinant of the coefficient matrix of the system must be 
zero. That is, 

x y 1 
*1 /I 1 

*2 y2 1 

Consequently, every point (x, y ) on the line satisfies 4; conversely, it can be shown that every point (*, y ) that 
satisfies 4 lies on the line. 


= 0 


(4) 


EXAMPLE 1 Equation of a Line 

Find the equation of the line that passes through the two points (2, 1) and (3, 7). 


Substituting the coordinates of the two points into Equation 4 gives 


x y 1 

2 1 1 
3 7 1 


= 0 


The cofactor expansion of this determinant along the first row then gives 

—6x +y +11 = 0 


A Circle Through Three Points 

Suppose that there are three distinct points in the plane, (x\,yi), (*2 J?)’ and (* 3 , 73 )> not all lying on a 
straight line. From analytic geometry we know that there is a unique circle, say, 

2 2 

ciO +J )+C2X+ cr^y+ C4 = 0 


( 5 ) 








that passes through them (Figure 10.1.2). Substituting the coordinates of the three points into this equation gives 


ci(*i +7i) +^2*1 +C4 = 0 


( 6 ) 


c l( x 2 + c 2*2 + C3T2 +^4= 0 


(7) 


c l( x 2 + 73 ) + C2 X 3 + C3T3 + C 4 = 0 


( 8 ) 


As before, Equations 5 through 8 form a homogeneous linear system with a nontrivial solution for c\ 9 C 2 , ^ 3 , 
and C 4 . Thus the determinant of the coefficient matrix is zero: 

x 2 +y 2 x y 1 


x j+yj *1 y 1 1 

x 2 +y% x 2 yi 1 

xj+yj *2 ^3 1 


This is a determinant form for the equation of the circle. 


= 0 


(9) 



EXAMPLE 2 Equation of a Circle 

Find the equation of the circle that passes through the three points (1,7), ( 6 , 2), and (4, 6 ). 
Substituting the coordinates of the three points into Equation 9 gives 


2 . 2 
x +7 

X 

y 

1 

50 

1 

7 

1 

40 

6 

2 

1 

52 

4 

6 

1 


which reduces to 

10(x 2 +y 2 ) - 20x -40y - 200 = 0 

(x-1) 2 +0-2) 2 = 5 2 


In standard form this is 








Thus the circle has center (1,2) and radius 5. 


A General Conic Section Through Five Points 

In his momumental work Principia Mathematica , Issac Newton posed and solved the following problem (Book 
I, Proposition 22, Problem 14): “To describe a conic that shall pass through five given points.” Newton solved 
this problem geometrically, as shown in Figure 10.1.3, in which he passed an ellipse through the points A, B, D, 
P, C; however, the methods of this section can also be applied. 


c: 



The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, or degenerate forms of 
these curves) is given by 

2 2 

c\x +C 2 xy +ciy +c^x + c$y +c$ = 0 

This equation contains six coefficients, but we can reduce the number to five if we divide through by any one of 
them that is not zero. Thus only five coefficients must be determined, so five distinct points in the plane are 
sufficient to determine the equation of the conic section (Figure 10.1.4). As before, the equation can be put in 
determinant form (see Exercise 7): 

* xy y z x 

xj *\y\ yj xi 

x \ XW 2 y \ X 2 

xj *373 y 3 x 3 

2 2 
*4 *4X4 74 x 4 

xj xsy 5 yj x 5 


y 1 
yi i 
yi i 

73 1 

74 i 

75 1 


= 0 


( 10 ) 




>' 




(•* 2 * >':) 

(Jt 3 ,yj) 

(* s .y s ) 

(* 4 ’> 4 ) 


Figure 10.1.4 

EXAMPLE 3 Equation of an Orbit 

An astronomer who wants to determine the orbit of an asteroid about the Sun sets up a Cartesian 
coordinate system in the plane of the orbit with the Sun at the origin. Astronomical units of 
measurement are used along the axes (1 astronomical unit = mean distance of Earth to Sun = 93 
million miles). By Kepler's first law, the orbit must be an ellipse, so the astronomer makes five 
observations of the asteroid at five different times and finds five points along the orbit to be 
(8.025,8.310), (10.170,6.355), (11.202,3.212), (10.736,0.375), (9.092, -2.267) 

Find the equation of the orbit. 

Substituting the coordinates of the five given points into 10 and rounding to three 
decimal places give 


= 0 


x 2 


7 2 

X 

y 

1 

64.401 

66.688 

69.056 

8.025 

8.310 

1 

103.429 

64.630 

40.386 

10.170 

6.355 

1 

125.485 

35.981 

10.317 

11.202 

3.212 

1 

115.262 

4.026 

0.141 

10.736 

0.375 

1 

82.664 

-20.612 

5.139 

9.092 

-2.267 

1 


The cofactor expansion of this determinant along the first row yields 

386.802x 2 - 102.895;cy + 446.029y 2 - 2476.443* - 1427.9987 - 17109.375 = 0 
Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points. 
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2 

0 
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Sun 


(8.025,8.310) 
(10.170, 6.355) 

(11.202,3.212) 
(10.736,0,375) 

(9.092. -2.267) 
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Figure 10.1.5 






A Plane Through Three Points 


In Exercise 8 we ask you to show the following: The plane in 3-space with equation 

c\x +C 2 y +C 3 Z + €4 = 0 

that passes through three noncollinear points (x 1 , y \, z \), (* 2 , z 2 ) 5 anc ^ (* 3 , 73 , Z 3 ) is given by the 
determinant equation 

x y z 1 

*1 y\ z\ i 

X2 72 *2 l 

*3 >>3 z 3 1 


= 0 


(ii) 


EXAMPLE 4 Equation of a Plane 

The equation of the plane that passes through the three noncollinear points (1,1,0),(2,0, 
and (2, 9, 2) is 

x y z 1 

11 0 1 

2 0-11 
2 9 2 1 


- 1 ), 


= 0 


which reduces to 


2x — y + 3z — 1 = 0 


A Sphere Through Four Points 


In Exercise 9 we ask you to show the following: The sphere in 3-space with equation 

2 2 2 

c i(* +7 +z )+C 2 X+C 2 y+ C 4 Z + cs = 0 

that passes through four noncoplanar points (x\,y\,z\), (x2, y2, z 2)> (* 3 > 73 , Z 3 ), and (x 4 , y 4 , z 4 ) is given 
by the following determinant equation: 


* 2 

+y 2 

+ z 2 

X 

y 

z 

1 

A 

+y? 

+ Z 1 

*1 

y 1 

z l 

1 

A 

+yj 

+A 

x 2 

72 

z 2 

1 

A 

+73 

+ A 

X3 

73 

z 3 

1 

A 

+y 4 

+ zj 

x 4 

74 

z 4 

1 


= 0 


( 12 ) 


EXAMPLE 5 Equation of a Sphere 








The equation of the sphere that passes through the four points (0, 3, 2), (1, — 1, 1), (2, 1, 0), 
and (5, 1, 3) is 


This reduces to 


2 . 2 . 2 

x +y +z x y 

13 0 3 

3 1 -1 

5 2 1 

35 5 1 


z 1 

2 1 
1 1 
0 1 
3 1 


x 2 +y 2 +z 2 - Ax - 2y - 6z + 5 = 0 


which in standard form is 

(x-2) 2 + 0—l) 2 + (z —3) 2 = 9 


Exercise Set 10.1 


1. Find the equations of the lines that pass through the following points: 

(a) (1. -1).(2.2) 

(b ) ( 0 , 1 ), ( 1 , - 1 ) 


Answer: 

(a) y = 3x — 4 

(b) y = — 2 x + 1 

2. Find the equations of the circles that pass through the following points: 

(a) (2, 6 ), (2, 0), (5, 3) 

(b) (2, -2), (3, 5), (-4, 6 ) 

Answer: 

(a) x 2 +y 2 -4x -6y +4 = 0 or (x - 2) 2 + (y-3) 2 = 9 

(b) x 2 +y 2 + 2x -4y - 20 = 0 or (x + l ) 2 + O - 2 ) 2 = 25 

3. Find the equation of the conic section that passes through the points (0, 0), (0, — 1), (2, 0), (2, — 5), and 

(4,-1). 

Answer: 

x 2 + 2 xy +y 2 — 2x +7 = 0 (a parabola) 

4. Find the equations of the planes in 3-space that pass through the following points: 

(a) (1,1, -3), (1, -1,1), (0, -1,2) 




(b) (2.3.1), (2, -1,-1), (1,2,1) 


Answer: 

(a) x + 2y + z = 0 

(b) —x + y — 2z + 1 = 0 

(a) Alter Equation 11 so that it determines the plane that passes through the origin and is parallel to the plane 
that passes through three specified noncollinear points. 

(b) Find the two planes described in part (a) corresponding to the triplets of points in Exercises 4(a) and 4(b). 
Answer: 

(a) x y z 0 

*i y i z\ i 
x 2 y2 22 1 

*3 y 3 z 3 1 

(b) x 4- 2y + z = 0; —x +y — 2z = 0 

6. Find the equations of the spheres in 3-space that pass through the following points: 

(a) (1,2,3), (-1,2, 1), (1, 0,1), (1,2, -1) 

(b ) (0,1, -2), (1,3,1), (2, -1,0), (3,1, -1) 

Answer: 

(a) x 2 -\- y 2 +z 2 — 2x — 4y — 2z = — 2 or (x — l) 2 + (y — 2) 2 + (z— l) 2 = 4 

(b) x 2 +y 2 +z 2 - 2x - 2^ = 3 or (x — l) 2 + (y — l) 2 +z 2 = 5 

7. Show that Equation 10 is the equation of the conic section that passes through five given distinct points in the 
plane. 

8. Show that Equation 11 is the equation of the plane in 3-space that passes through three given noncollinear 
points. 

9. Show that Equation 12 is the equation of the sphere in 3-space that passes through four given noncoplanar 
points. 

10. Find a determinant equation for the parabola of the form 

2 

c\y+C2X +C2X +^4=0 

that passes through three given noncollinear points in the plane. 



Answer: 



y x 2 x 1 

y 1 *? *1 i 

= o 

72 x 2 x 2 1 

73 *3 *3 1 

11 . What does Equation 9 become if the three distinct points are collinear? 

Answer: 

The equation of the line through the three collinear points 

12 . What does Equation 11 become if the three distinct points are collinear? 

Answer: 

0 = 0 

13 . What does Equation 12 become if the four points are coplanar? 

Answer: 

The equation of the plane through the four coplanar points 

Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematical Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. The general equation of a quadric surface is given by 

2 2 2 

a\x + a^y + a^z + a$xy 4- a^xz+a^yz + ajx + a%y + agz + aio = 0 

Given nine points on this surface, it may be possible to determine its equation. 

(a) Show that if the nine points y 2 ) for i = 1, 2, 3,9 lie on this surface, and if they determine uniquely 
the equation of this surface, then its equation can be written in determinant form as 




2 2 2 

x y z xy xz yz x y z 1 

*1 y \ z \ *171 * i*i y \ z \ *i 71 z i 1 

*2 y\ z 2 * 2>>2 X 2 Z 2 y2 z 2 x 2 72 z 2 1 

*3 73 z 3 *373 * 3^3 73^3 *3 73 *3 1 

x 4 y\ z 4 * 4X4 * 4?4 74^4 *4 74 z 4 1 

*5 75 25 x*y 5 X 5 Z 5 75 Z 5 x 5 y 5 z 5 1 

*6 76 z 6 *676 * 6^6 76*6 *6 76 *6 1 

*7 7 ? z 7 *777 X 7 Z 7 77^7 *7 77 Z7 1 

*8 7g z 8 *878 * 8*8 78*8 *8 78 *8 1 

Xg yg Zg xgyg xgzg ygzg xg yg zg 1 

(b) Use the result in part (a) to determine the equation of the quadric surface that passes through the points 
(1. 2, 3), (2, 1.7), (0, 4, 6), (3, - 1, 4), (3, 0, 11), ( - 1, 5, 8), (9, - 8, 3), (4, 5, 3), and 
(-2, 6,10). 

T2. 

(a) A hyperplane in the ^-dimensional Euclidean space R n has an equation of the form 

fll*l +«2t2*2+tf3*3+ ' ’ • +<*n x n+ a n +1 = 0 

where «j, i = 1, 2, 3,.... n + 1, are constants, not all zero, and *i, i = 1, 2, 3, • • •, n, are variables for 
which 

{x\,X2,X3 . x n )&R n 

A point 

(*10. *20. *30.— .XrffieR” 

lies on this hyperplane if 

a 1*10+ <22*20+^3* 30 + ‘ ‘ • +tf M x M 0 + tfw+l = ° 

Given that the n points (x\j, *2i> x 2i> --*> x m )? * = 1, 2, 3, n, lie on this hyperplane and that they 
uniquely determine the equation of the hyperplane, show that the equation of the hyperplane can be written 
in determinant form as 

XI x 2 X3 • • • 1 

All *21 *31 ' - ' *nl 1 

*12 *22 *32 • ’ • *m2 1 _q 

*13 *23 *33 * • • *m3 1 

*lw *2 m *3 n ‘ ‘ ‘ *mm 1 

(b) Determine the equation of the hyperplane in R? that goes through the following nine points: 




(1,2, 3,4, 5, 6,7, 8, 9) (2, 3,4, 5, 
(3,4, 5, 6,7, 8, 9, 1,2) (4, 5, 6, 7, 
(5, 6,7, 8, 9, 1,2, 3,4) (6,7, 8, 9, 
(7, 8, 9, 1,2, 3,4, 5, 6) (8, 9, 1,2, 
(9, 1,2, 3, 4, 5, 6,7, 8) 
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, 7, 8, 9,1) 
:, 9,1,2, 3) 
,2, 3,4,5) 
,4, 5, 6,7) 



10.2 Geometric Linear Programming 

In this section we describe a geometric technique for maximizing or minimizing a linear expression in two 
variables subject to a set of linear constraints. 


Prerequisites 

Linear Systems 
Linear Inequalities 


Linear Programming 


The study of linear programming theory has expanded greatly since the pioneering work of George Dantzig in 
the late 1940s. Today, linear programming is applied to a wide variety of problems in industry and science. In 
this section we present a geometric approach to the solution of simple linear programming problems. Let us 
begin with some examples. 

EXAMPLE 1 Maximizing Sales Revenue 


A candy manufacturer has 130 pounds of chocolate-covered cherries and 170 pounds of 
chocolate-covered mints in stock. He decides to sell them in the form of two different mixtures. 
One mixture will contain half cherries and half mints by weight and will sell for $2.00 per 
pound. The other mixture will contain one-third cherries and two-thirds mints by weight and 
will sell for $1.25 per pound. How many pounds of each mixture should the candy 
manufacturer prepare in order to maximize his sales revenue? 


Let the mixture of half cherries and half mints be called mix A, 
and let x \ be the number of pounds of this mixture to be prepared. Let the mixture of one-third 
cherries and two-thirds mints be called mix B, and let *2 be the number of pounds of this 
mixture to be prepared. Since mix A sells for $2.00 per pound and mix B sells for $ 1.25 per 
pound, the total sales z (in dollars) will be 

z = 2 . 00 xi + 1-25x2 


Since each pound of mix A contains 4 pound of cherries and each pound of mix B contains 4 
pound of cherries, the total number of pounds of cherries used in both mixtures is 


2* 1 + 3* 2 

Similarly, since each pound of mix A contains 7 - pound of mints and each pound of mix B 

o 

contains — pound of mints, the total number of pounds of mints used in both mixtures is 



Because the manufacturer can use at most 130 pounds of cherries and 170 pounds of mints, we 
must have 


^*1 + 130 

■1*1 + -|x2< 170 


Furthermore, since x \ and x 2 cannot be negative numbers, we must have 

xj>0 and x 2 >0 

The problem can therefore be formulated mathematically as follows: Find values of and x 2 
that maximize 


z = 2.00xi + 1-25x2 


subject to 


^•X!+jX2 <130 
^xi + |x 2 <170 


xi > 0 

X2 >0 


Later in this section we will show how to solve this type of mathematical problem 
geometrically. 


EXAMPLE 2 Maximizing Annual Yield 

A woman has up to $10,000 to invest. Her broker suggests investing in two bonds, A and B. 
Bond A is a rather risky bond with an annual yield of 10%, and bond B is a rather safe bond 
with an annual yield of 7%. After some consideration, she decides to invest at most $6000 in 
bond A, to invest at least $2000 in bond B, and to invest at least as much in bond A as in bond 
B. How should she invest her money in order to maximize her annual yield? 

Let x 1 be the number of dollars to be invested in bond A, and 
let x 2 be the number of dollars to be invested in bond B. Since each dollar invested in bond A 
earns $.10 per year and each dollar invested in bond B earns $.07 per year, the total dollar 
amount z earned each year by both bonds is 


. 10xj + .07x2 


The constraints imposed can be formulated mathematically as follows: 


Invest no more than $ 10,000: 

Invest at most $ 6000 in bond A: 

Invest at least $ 2000 in bond B. 

Invest at least as much in bond A as in bond B: 


xi + x 2 <10, 000 


xi <6000 

x 2 > 2000 
*1 >*2 


We also have the implicit assumption that x 1 and x 2 are nonnegative: 


xj>0 and X 2>0 

Thus the complete mathematical formulation of the problem is as follows: Find values of x \ 
and X2 that maximize 

z= lOxi + -07*2 

subject to 


XI +x 2 

< 10 , 000 

*1 

<6000 

*2 

>2000 

*1 “*2 

>0 

*1 

>0 

x 2 

>0 


EXAMPLE 3 Minimizing Cost 


A student desires to design a breakfast of cornflakes and milk that is as economical as possible. 
On the basis of what he eats during his other meals, he decides that his breakfast should supply 
him with at least 9 grams of protein, at least 4 the recommended daily allowance (RDA) of 

vitamin D, and at least ^ the RDA of calcium. He finds the following nutrition and cost 
information on the milk and cornflakes containers: 



Milk 
( 5 cup) 

Cornflakes 

(1 ounce) 

Cost 

7.5 cents 

5.0 cents 

Protein 

4 grams 

2 grains 

Vitamin I) 

^ of RDA 

1^5 of RDA 

Calcium 

5 of RDA 

None 


In order not to have his mixture too soggy or too dry, the student decides to limit himself to 
mixtures that contain 1 to 3 ounces of cornflakes per cup of milk, inclusive. What quantities of 
milk and cornflakes should he use to minimize the cost of his breakfast? 


Let x i be the quantity of milk used (measured in -i-cup units). 

La 

and let *2 he the quantity of cornflakes used (measured in 1-ounce units). Then if z is the cost 
of the breakfast in cents, we may write the following. 












Cost of breakfast: 

At least 9 grams protein: 

At least j KDA vitamin D: 

At least -7 RDA calcium: 

At least 1 ounce cornflakes 
per cup ^two — cups jof milk: 

At most 3 ounces cornflakes 
per cup (two L — cups Jof milk: 


z = 7.5*i + 5.0*2 
4*i + 2*2 > 9 

8* 1 + To"* 2 -J 

h* X 4 


f|>^(or*i -2*2<0) 


f|<|(or 3*i -2*2 >0) 


As before, we also have the implicit assumption that x\ > 0 and x 2 
mathematical formulation of the problem is as follows: Find values 

subject to 


z = 7.5*1 + 5 

0*2 

4*i 4- 2*2 

>9 

8* 1+ W* 2 

IV 

h 

1^- 

Al 

*1 - 2*2 

<0 

3*1 - 2*2 

>0 

*i 

>0 

*2 

>0 


> 0. Thus the complete 
of x 1 and x 2 that minimize 


Geometric Solution of Linear Programming Problems 

Each of the preceding three examples is a special case of the following problem. 


Problem 

Find values of x 1 and *2 that either maximize or minimize 

z = cixi + c 2 x 2 (1) 


subject to 


£311*1 

+ 

< 312*2 

(<)(>)( = ) 

b\ 


<321* 1 

+ 

<*22*2 

(<)(>)( = ) 

bi 

(2) 

£3^1*1 

+ 

<3 m2* 2 

(<)(>)( = ) 

b m 




*1 > 0 , 

* 

to 

IV 

0 


(3) 


In each of the m conditions of 2, any one of the symbols < , > , and — may be used. 

The problem above is called the general linear programming problem in two variables. The linear function z 
in 1 is called the objective function. Equations 2 and 3 are called the constraints', in particular, the equations 
in 3 are called the nonnegativity constraints on the variables x [ and *2- 

We will now show how to solve a linear programming problem in two variables graphically. A pair of values 
(A 1 , * 2 ) that satisfy all of the constraints is called a feasible solution. The set of all feasible solutions 
determines a subset of the x i*2-plane called the feasible region. Our desire is to find a feasible solution that 
maximizes the objective function. Such a solution is called an optimal solution. 

To examine the feasible region of a linear programming problem, let us note that each constraint of the form 

a i \x\-¥a i 2 x 2 = b i 

defines a line in the x i* 2 -plane, whereas each constraint of the form 

c 2 ,ixi + a,- 2x2 < bi or £3,1*1 + <3*2*2 > bj 
defines a half-plane that includes its boundary line 

£3,1*1 + <3*2*2 = ^i 

Thus the feasible region is always an intersection of finitely many lines and half-planes. For example, the four 
constraints 


2* 1 + 3* 2 

<130 

|* l +§*2 

<170 

*1 

>0 

*2 

>0 


of Example 1 define the half-planes illustrated in parts (a), (b), (c), and (d) of Figure 10.2.1. The feasible 
region of this problem is thus the intersection of these four half-planes, which is illustrated in Figure 10.2.1 e. 



t 


*2 

255 


ft, + yx 2 < 170 


340 Jc, 


( 6 ) 


>0 


(<) 


i i x 2 

x 2 >0 


(d) 

Figure 10.2.1 


1 (0.255) 


(180.120) 


( 0 . 0 ) 


(260.0) 


(e) 


It can be shown that the feasible region of a linear programming problem has a boundary consisting of a finite 
number of straight line segments. If the feasible region can be enclosed in a sufficiently large circle, it is 
called bounded (Figure 10.2. le); otherwise, it is called unbounded (see Figure 10.2.5). If the feasible region 
is empty (contains no points), then the constraints are inconsistent and the linear programming problem has no 
solution (see Figure 10.2.6). 

Those boundary points of a feasible region that are intersections of two of the straight line boundary segments 
are called extreme points. (They are also called corner points and vertex points .) For example, in Figure 
10.2. le, we see that the feasible region of Example 1 has four extreme points: 

(0,0), (0,255), (180,120), (260,0) (4) 


The importance of the extreme points of a feasible region is shown by the following theorem. 


Maximum and Minimum Values 

If the feasible region of a linear programming problem is nonempty and bounded, then the objective 
function attains both a maximum and a minimum value, and these occur at extreme points of the 
feasible region. If the feasible region is unbounded, then the objective function may or may not attain 
a maximum or minimum value; however, if it attains a maximum or minimum value, it does so at an 
extreme point. 














Figure 10.2.2 suggests the idea behind the proof of this theorem. Since the objective function 

z = c\x\ +C2*2 

of a linear programming problem is a linear function of * \ and *2, its level curves (the curves along which z 
has constant values) are straight lines. As we move in a direction perpendicular to these level curves, the 
objective function either increases or decreases monotonically. Within a bounded feasible region, the 
maximum and minimum values of z must therefore occur at extreme points, as Figure 10.2.2 indicates. 



In the next few examples we use Theorem 10.2.1 to solve several linear programming problems and illustrate 
the variations in the nature of the solutions that may occur. 

EXAMPLE 4 Example 1 Revisited 

Figure 10.2. le shows that the feasible region of Example 1 is bounded. Consequently, from 
Theorem 10.2.1 the objective function 

z = 2.00;q + 1.25*2 

attains both its minimum and maximum values at extreme points. The four extreme points and 
the corresponding values of z are given in the following table. 


F.xtreine Point 

Value of 


Z = 2.00*! + 1.2fvr 2 

(0,0) 

0 

(0. 255) 

318.75 

(180, 120) 

510.00 

(260, 0) 

520.00 


We see that the largest value ofz is 520.00 and the corresponding optimal solution is (260, 0). 
Thus the candy manufacturer attains maximum sales of $520 when he produces 260 pounds of 
mixture A and none of mixture B. 








EXAMPLES Using Theorem 10.2.1 


Find values of x \ and ^2 that maximize 

z = x i + 3^2 


subject to 


2xi+ 3x2 < 24 

X-1-X2 < 7 

X2 < 6 

x\ > 0 

X2 > 0 


In Figure 10.2.3 we have drawn the feasible region of this problem. Since it is 
bounded, the maximum value of z is attained at one of the five extreme points. The values of 
the objective function at the five extreme points are given in the following table. 


- 2 *, + 3*2 = 24 

‘(0.6) (3.6) 

*2 = 6 

.*1 -Xj = 7 


j-1-1-1-1-L 


( 0 , 0 ) 


(9, 2) 

•_I_I_I_L 

(7*0) 


Figure 10.2.3 


X 

► 


L 


Extreme Point 

(*h * 2 ) 

Value of 

Z = *1 + 3*2 

(0, 6) 

18 

(3. 6) 

21 

(9, 2) 

15 

(7.0) 

7 

(0,0) 

0 


From this table, the maximum value of z is 21, which is attained at x i = 3 and X2 = 6. 


EXAMPLE 6 Using Theorem 10.2.1 









Find values of x \ and *2 that maximize 


subject to 


z = 4x\+ 6 x 2 


2 xi + 3*2 

< 

24 

*1 “*2 

< 

7 

x 2 

< 

6 

*1 

> 

0 

x 2 

> 

0 


The constraints in this problem are identical to the constraints in Example 5, so the 
feasible region of this problem is also given by Figure 10.2.3. The values of the objective 
function at the extreme points are given in the following table. 


Extreme I'oint 

<*,, *2> 

Value of 
z = 4*! + 6x 2 

(0.6) 

36 

(3.6) 

48 

(9,2) 

48 

(7.0) 

28 

(0,0) 

0 


We see that the objective function attains a maximum value of 48 at two adjacent extreme 
points, (3, 6) and (9, 2). This shows that an optimal solution to a linear programming problem 
need not be unique. As we ask you to show in Exercise 10, if the objective function has the 
same value at two adjacent extreme points, it has the same value at all points on the straight line 
boundary segment connecting the two extreme points. Thus, in this example the maximum 
value ofz is attained at all points on the straight line segment connecting the extreme points 
(3, 6) and (9, 2). 


EXAMPLE 7 The Feasible Region Is a Line Segment 


Find values of xj and *2 that minimize 


subject to 


z = 2 x\ 

— x 2 


2x\ 4= 3x2 

= 

12 

2xi - 3x2 

> 

0 

*1 

> 

0 

x 2 

> 

0 


In Figure 10.2.4 we have drawn the feasible region of this problem. Because one of 
the constraints is an equality constraint, the feasible region is a straight line segment with two 
extreme points. The values of z at the two extreme points are given in the following table. 





_2t,+ 3.t 2 =l2 


3* 2 = 0 


? 


i 


(3.2) 

■ ■ ■ ■ ■ W>* ' 

Figure 10.2.4 


Extreme Point 
x 2 ) 

Value of 

z = 2x, -x 2 

(3,2) 

(6, 0) 

4 

12 


The minimum value of z is thus 4 and is attained at xi = 3 and xj = 2. 


EXAMPLE 8 Using Theorem 10.2.1 

Find values of xi and *2 that maximize 

z = 2xi + 5*2 


subject to 

2xi +*2 > 8 

—4xi+X2 < 2 

2xi —3x2 ^ 0 

xi > 0 

*2 > 0 


The feasible region of this linear programming problem is illustrated in Figure 
10.2.5. Since it is unbounded, we are not assured by Theorem 10.2.1 that the objective function 
attains a maximum value. In fact, it is easily seen that since the feasible region contains points 
for which both x i and X 2 are arbitrarily large and positive, the objective function 

z = 2x\ + 5x2 

can be made arbitrarily large and positive. This problem has no optimal solution. Instead, we 
say the problem has an unbounded solution. 








( 1 . 6 ) 


-4 v, + x 2 = 2 


J L 


2jcj - 3i* 2 = 0 

(3,2) 

Zv, + x*, = 8 

i 'i i J 


Figure 10.2.5 


EXAMPLE 9 Using Theorem 10.2.1 

Find values of * 1 and *2 that maximize 
subject to 


z = — 

+ *2 

2x\ + x 2 

> 

-Axi +X2 

< 

2x\ -3x2 

< 

*1 

> 

*2 

> 


The above constraints are the same as those in Example 8, so the feasible region of 
this problem is also given by Figure 10.2.5. In Exercise 11 we ask you to show that the 
objective function of this problem attains a maximum within the feasible region. By Theorem 
10.2.1, this maximum must be attained at an extreme point. The values of z at the two extreme 
points of the feasible region are given in the following table. 


F.vlreine Point 

Value of 

(x,. x 2 ) 

Z = ->*, +x 2 

(1.6) 

1 

(3, 2) 

-13 


The maximum value of z is thus 1 and is attained at the extreme point x \ = 1, *2 = 6. 


EXAMPLE 10 Inconsistent Constraints 







Find values of x \ and *2 that minimize 


subject to 


z — 3x\ — 87:2 

2x\ — X 2 < 4 

3^1 + 11^2 < 33 
3xi + 4x2 ^ 24 

xi > 0 

X2 > 0 


As can be seen from Figure 10.2.6, the intersection of the five half-planes defined 
by the five constraints is empty. This linear programming problem has no feasible solutions 
since the constraints are inconsistent. 



There are no points com m on to all five shaded half-planes. 


Exercise Set 10.2 


1. Find values of xj and X 2 that maximize 
subject to 


z = 3x\ -F 2x2 


2xi + 3*2 

< 

6 

2xi~ x 2 

> 

0 

*1 

< 

2 

x 2 

< 

1 

X 1 

> 

0 

x 2 

> 

0 


Answer: 


2 22 

xi = 2, X 2 = y; maximum value of z = 




2. Find values of and x 2 that minimize 
subject to 


z = 3x\ 

-5x 2 

2xi -*2 

< - 

4xi -x 2 

> 

*2 

< 

*1 

> 

*2 

> 


Answer: 


No feasible solutions 
3. Find values of x \ and x 2 that minimize 

subject to 


z= — 3xi + 

2x 2 

3xi -x 2 

> 

-5 

-xi +x 2 

> 

1 

2 xi + 4 x 2 

> 

12 

*1 

> 

0 

x 2 

> 

0 


Answer: 

Unbounded solution 

4. Solve the linear programming problem posed in Example 2. 
Answer: 


Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 

5. Solve the linear programming problem posed in Example 3. 


Answer: 


7 25 335 

-jr cup of milk, rHr ounces of com flakes; minimum cost = “ 

9 18 18 


18.6& 


6. In Example 5 the constraint x\ — 7:2 < 7 is said to be nonbinding because it can be removed from the 
problem without affecting the solution. Likewise, the constraint 7:2 < 6 is said to be binding because 
removing it will change the solution. 


(a) Which of the remaining constraints are nonbinding and which are binding? 

(b) For what values of the right-hand side of the nonbinding constraint x \ — ^2 < 7 will this constraint 
become binding? For what values will the resulting feasible set be empty? 

(c) For what values of the right-hand side of the binding constraints X 2 < 6 will this constraint become 
nonbinding? For what values will the resulting feasible set be empty? 



Answer: 


(a) *1 > 0 and *2 > 0 are nonbinding; 2x\ + 3x2 < 24 is binding 

(b) x \ — X2 < v for v < — 3 is binding and for y < — 6 yields the empty set. 

(c) x 2 < v for v < 8 is nonbinding and for v < 0 yields the empty set. 

7. A trucking firm ships the containers of two companies, A and B. Each container from company A weighs 
40 pounds and is 2 cubic feet in volume. Each container from company B weighs 50 pounds and is 3 cubic 
feet in volume. The trucking firm charges company A $2.20 for each container shipped and charges 
company B $3.00 for each container shipped. If one of the firm's trucks cannot carry more than 37,000 
pounds and cannot hold more than 2000 cubic feet, how many containers from companies A and B should 
a truck carry to maximize the shipping charges? 

Answer: 

550 containers from company A and 300 containers from company B; maximum shipping 
charges = $2110 

8 . Repeat Exercise 7 if the trucking firm raises its price for shipping a container from company A to $2.50. 

Answer: 

925 containers from company A and no containers from company B; maximum shipping 
charges = $2312.50 

9. A manufacturer produces sacks of chicken feed from two ingredients, A and B. Each sack is to contain at 
least 10 ounces of nutrient N\, at least 8 ounces of nutrient and at least 12 ounces of nutrient A/ 3 . 
Each pound of ingredient A contains 2 ounces of nutrient N\. 2 ounces of nutrient and 6 ounces of 
nutrient N 3 . Each pound of ingredient B contains 5 ounces of nutrient N\, 3 ounces of nutrient N 2 , and 4 
ounces of nutrient N 3 . If ingredient A costs 8 cents per pound and ingredient B costs 9 cents per pound, 
how much of each ingredient should the manufacturer use in each sack of feed to minimize his costs? 

Answer: 

0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum cost = 24.8 & 

10. If the objective function of a linear programming problem has the same value at two adjacent extreme 
points, show that it has the same value at all points on the straight line segment connecting the two 
extreme points. [Hint: If (z j, x'-,) and (x”, x 2 ) are an y two points in the plane, a point (zj, Z 2 ) lies on 
the straight line segment connecting them if 

x\ =tx [ + (1 — t)x" 
and 

X 2 = tx' 2 + (1 -t)x" 

where t is a number in the interval [ 0 , 1 ].] 

11. Show that the objective function in Example 9 attains a maximum value in the feasible set. [Hint: 

Examine the level curves of the objective function.] 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica , Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 

Tl. Consider the feasible region consisting of 0 < x, 0 <y along with the set of inequalities 



for k = 0, 1, 21. Maximize the objective function 

z = 3x + Ay 

assuming that (a)« = 1, (b) n = 2, (c )» = 3, (d)« = 4, (e) n = 5, (f) n = 6, (g) n = 1, (h) « = 8, (0 n = 9, 
(j) n = 10, and (k) ^ . (1) Next, maximize this objective function using the nonlinear feasible region, 

0 <x, 0 <y, and 



(m) Let the results of parts (a) through (k) begin a sequence of values for ^max- Do these values approach the 
value determined in part (1)? Explain. 

T2. Repeat Exercise Tl using the objective function z = x + y. 
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10.3 The Earliest Applications of Linear Algebra 


Linear systems can be found in the earliest writings of many ancient civilizations. In this section we give 


some examples of the types of problems that they used to solve. 


Prerequisites 

Linear Systems 

The practical problems of early civilizations included the measurement of land, the distribution of goods, the 
tracking of resources such as wheat and cattle, and taxation and inheritance calculations. In many cases, these 
problems led to linear systems of equations since linearity is one of the simplest relationships that can exist 
among variables. In this section we present examples from five diverse ancient cultures illustrating how they 
used and solved systems of linear equations. We restrict ourselves to examples before a.d. 500. These 
examples consequently predate the development of the field of algebra by Islamic/Arab mathematicians, a 
field that ultimately led in the nineteenth century to the branch of mathematics now called linear algebra. 


EXAMPLE 1 Egypt (about 1650 B.c.) 



Problem 40 of the Ahmes Papyrus 


The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient Egyptian 
mathematics. This 5-meter-long papyrus contains 84 short mathematical problems, together 
with their solutions, and dates from about 1650 B.C. Problem 40 in this papyrus is the following: 

Divide 100 hekats of barley among five men in arithmetic progression so that the sum of 
the two smallest is one-seventh the sum of the three largest. 

Let a be the least amount that any man obtains, and let d be the common difference of the terms 
in the arithmetic progression. Then the other four men receive a \ d, a | 2d, a \ 3d, and 
a | Ad hekats. The two conditions of the problem require that 


cl + (ct + d ) + (ct 4- 2 d) + (ct + 3d ) =t= (ct A~ Ad ) = 100 

y [ (ct + 2d) + (a + 3d) + (a + Ad) ] = a + (a + d) 


These equations reduce to the following system of two equations in two unknowns: 




5 a + lCka? 

11 a— 2d 


100 

0 


( 1 ) 


The solution technique described in the papyrus is known as the method of false position or 
false assumption. It begins by assuming some convenient value of a (in our case a = ]), 
substituting that value into the second equation, and obtaining ^=11/2- Substituting a = 1 
and £^ = 11/2 i nt0 the left-hand side of the first equation gives 60, whereas the right-hand side 
is 100. Adjusting the initial guess for a by multiplying it by 100 / 60 leads to the correct value 
a = 5 / 3- Substituting a = 5/3 into the second equation then gives d = 55 / 6, so the 
quantities of barley received by the five men are 10 / 6, 65/6, 120 / 6, 175 / 6, and 230 / 6 
hekats. This technique of guessing a value of an unknown and later adjusting it has been used 
by many cultures throughout the ages. 


EXAMPLE 2 


Babylonia (1900-1600 b.c.) 



The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.C. Many clay 
tablets containing mathematical tables and problems survive from that period, one of which 
(designated Ca MLA 1950) contains the next problem. The statement of the problem is a bit 
muddled because of the condition of the tablet, but the diagram and the solution on the tablet 
indicate that the problem is as follows: 






A trapezoid with an area of320 square units is cut off from a right triangle by a line 
parallel to one of its sides. The other side has length 50 units, and the height of the 
trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? 


Let x be the lower width of the trapezoid and y its upper width. The area of the trapezoid is its 
height times its average width, so 20 X ^ ^ j = 320. Using similar triangles, we also have 
x v 

— = -f-y. The solution on the tablet uses these relations to generate the linear system 

i(*+.y) = 16 

! < 

Adding and subtracting these two equations then gives the solution x = 20 and y = 12- 


EXAMPLE 3 China (a.d. 263) 


◄ 




Chiu Chang Suan Shu in Chinese characters 

The most important treatise in the history of Chinese mathematics is the Chiu Chang Suan Shu, 
or “The Nine Chapters of the Mathematical Art.” This treatise, which is a collection of 246 
problems and their solutions, was assembled in its final form by Liu Hui in A.D. 263. Its 
contents, however, go back to at least the beginning of the Han dynasty in the second century 
B.C. The eighth of its nine chapters, entitled “The Way of Calculating by Arrays,” contains 18 
word problems that lead to linear systems in three to six unknowns. The general solution 
procedure described is almost identical to the Gaussian elimination technique developed in 















Europe in the nineteenth century by Carl Friedrich Gauss. The first problem in the eighth 
chapter is the following: 


There are three classes of corn, of which three bundles of the first class, two of the 
second, and one of the third make 39 measures. Two of the first, three of the second, and 
one of the third make 34 measures. And one of the first, two of the second, and three of 
the third make 26 measures. How many measures of grain are contained in one bundle 
of each class? 


Let x, y, and z be the measures of the first, second, and third classes of com. Then the 
conditions of the problem lead to the following linear system of three equations in three 


unknowns: 

3x + 2y+z 
2x + 3y+z 
x + 2y + 3z 


= 39 

= 34 ( 3 ) 

= 26 


The solution described in the treatise represented the coefficients of each equation by an 
appropriate number of rods placed within squares on a counting table. Positive coefficients 
were represented by black rods, negative coefficients were represented by red rods, and the 
squares corresponding to zero coefficients were left empty. The counting table was laid out as 
follows so that the coefficients of each equation appear in columns with the first equation in the 
rightmost column: 


1 

21 

3 

2 

3 

2 

3 

1 

1 

26 

34 

39 


Next, the numbers of rods within the squares were adjusted to accomplish the following two 
steps: (1) two times the numbers of the third column were subtracted from three times the 
numbers in the second column and (2) the numbers in the third column were subtracted from 
three times the numbers in the first column. The result was the following array: 




3 

4 

5 

2 

8 

1 

1 

39 

24 

39 


In this array, four times the numbers in the second column were subtracted from five times the 
numbers in the first column, yielding 




3 


5 

2 

36 

1 

_ 11 

99 

24 

39 


This last array is equivalent to the linear system 




















3x + 2 y+z = 39 
5 y+z = 24 
36z = 99 


This triangular system was solved by a method equivalent to back substitution to obtain 

x = 37 / 4> y = 17 / 4> and z = 11 / 4- 


EXAMPLE 4 Greece (third century B.c.) 



Archimedes c. 287-212 B.C. 


Perhaps the most famous system of linear equations from antiquity is the one associated with 
the first part of Archimedes' celebrated Cattle Problem. This problem supposedly was posed by 
Archimedes as a challenge to his colleague Eratosthenes. No solution has come down to us 
from ancient times, so that it is not known how, or even whether, either of these two geometers 
solved it. 

If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who 
once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four 
herds of different colors, one milk white, another glossy black, a third yellow, and the 
last dappled. In each herd were bulls, mighty in number according to these proportions: 
Understand, stranger, that the white bulls were equal to a half and a third of the black 
together with the whole of the yellow, while the black were equal to the fourth part of 
the dappled and a fifth, together with, once more, the whole of the yellow. Observe 
further that the remaining bulls, the dappled, were equal to a sixth part of the white and 
a seventh, together with all of the yellow. These were the proportions of the cows: The 
white were precisely equal to the third part and a fourth of the whole herd of the black; 
while the black were equal to the fourth part once more of the dappled and with it a 


fifth part, when all, including the bulls, went to pasture together Now the dappled in 
four parts were equal in number to a fifth part and a sixth of the yellow herd. Finally 
the yellow were in number equal to a sixth part and a seventh of the white herd. If thou 
canst accurately tell, O stranger, the number of cattle of the Sun, giving separately the 
number of well-fed bulls and again the number offemales according to each color, thou 
wouldst not be called unskilled or ignorant of numbers, but not yet shalt thou be 
numbered among the wise. 


The conventional designation of the eight variables in this problem is 


w 

— 

number of white bulls 

B 

= 

number of black bulls 

Y 

= 

number of yellow bulls 

D 

= 

number of dappled bulls 

w 

= 

number of white cows 

b 

= 

number of black cows 

y 

= 

number of yellow cows 

d 

= 

number of dappled cows 


The problem can now be stated as the following seven homogeneous equations in eight 
unknowns: 


1 - W={±*fjB+Y 

2 - s=(I + i)£+r 

3 - £)=|i + IjfF+7 

4 W = ("3 + 

6 '-(s + «) (r+ ' > 

7 y=tt + ±W+w) 


(The white bulls were equal to a half and a third of the 
black [bulls] together with the whole of the yellow 
[bulls].) 

(The black [bulls] were equal to the fourth part of the 
dappled [bulls] and a fifth, together with, once more, the 
whole of the yellow [bulls].) 

(The remaining bulls, the dappled, were equal to a sixth 
part of the white [bulls] and a seventh, together with all 
of the yellow [bulls].) 

(The white [cows] were precisely equal to the third part 
and a fourth of the whole herd of the black.) 

(The black [cows] were equal to the fourth part once 
more of the dappled and with it a fifth part, when all, 
including the bulls, went to pasture together.) 

(The dappled [cows] in four parts [that is, in totality] 
were equal in number to a fifth part and a sixth of the 
yellow herd.) 

(The yellow [cows] were in number equal to a sixth part 
and a seventh of the white herd.) 


As we ask you to show in the exercises, this system has infinitely many solutions of the form 


w 

= 

10, 366,482* 

B 

= 

7,460,514* 

Y 

= 

4, 149, 387* 

D 

= 

7, 358, 060* 

w 

= 

7, 206, 360* 

b 

= 

4, 893, 246* 

y 

= 

5,439,213* 

d 

= 

3,515,820* 


where k is any real number. The values Ar = 1, 2,_give infinitely many positive integer 

solutions to the problem, with £ = 1 giving the smallest solution. 


EXAMPLE 5 India (fourth century a.d.) 



Fragment III-5-3v of the Bakhshali Manuscript 

The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from around 
the fourth century A.D., although some of its materials undoubtedly come from many centuries 
before. It consists of about 70 leaves or sheets of birch bark containing mathematical problems 
and their solutions. Many of its problems are so-called equalization problems that lead to 
systems of linear equations. One such problem on the fragment shown is the following: 


One merchant has seven asava horses, a second has nine haya horses, and a third has 
ten camels. They are equally well off in the value of their animals if each gives two 
animals, one to each of the others. Find the price of each animal and the total value of 
the animals possessed by each merchant. 


Let x be the price of an asava horse, let y be the price of a haya horse, let z be the price of a 
camel, and the let K be the total value of the animals possessed by each merchant. Then the 
conditions of the problem lead to the following system of equations: 

5x + y + z = K 
x + 7 y+z = K 
x + y 4- 8z = K 

The method of solution described in the manuscript begins by subtracting the quantity 


( 5 ) 


(x 4 . y 4 . z) from both sides of the three equations to obtain 4x = 6y = lz = K — (x + y +z) 
. This shows that if the prices x, y, and z are to be integers, then the quantity K — (x | y \ z) 
must be an integer that is divisible by 4, 6 , and 7. The manuscript takes the product of these 
three numbers, or 168, for the value of K — (* | y \ z), which yields x =42, y = 28, and 
z = 24 for the prices and = 262 f° r the total value. (See Exercise 6 for more solutions to this 
problem.) 


Exercise Set 10.3 

1. The following lines from Book 12 of Homer's Odyssey relate a precursor of Archimedes' Cattle Problem: 

Thou shalt ascend the isle triangular, 

Where many oxen of the Sun are fed, 

And fatted flocks. Of oxen fifty head 
In every herd feed, and their herds are seven; 

And of his fat flocks is their number even. 

The last line means that there are as many sheep in all the flocks as there are oxen in all the herds. What is 
the total number of oxen and sheep that belong to the god of the Sun? (This was a difficult problem in 
Homer's day.) 

Answer: 

700 

2. Solve the following problems from the Bakhshali Manuscript. 

(a) B possesses two times as much as A; C has three times as much as A and B together; D has four times 
as much as A, B, and C together. Their total possessions are 300. What is the possession of A? 

(b) B gives 2 times as much as A; C gives 3 times as much as B; D gives 4 times as much as C. Their total 
gift is 132. What is the gift of A? 

Answer: 

(a) 5 

(b) 4 

3. A problem on a Babylonian tablet requires finding the length and width of a rectangle given that the length 
and the width add up to 10, while the length and one-fourth of the width add up to 7. The solution 
provided on the tablet consists of the following four statements: 


Multiply 7 by 4 to obtain 28. 


Take away 10 from 28 to obtain 18. 

Take one-third of 18 to obtain 6 , the length. 

Take away 6 from 10 to obtain 4, the width. 

Explain how these steps lead to the answer. 

4 . The following two problems are from “The Nine Chapters of the Mathematical Art.” Solve them using the 

array technique described in Example 3. 

(a) Five oxen and two sheep are worth 10 units and two oxen and five sheep are worth 8 units. What is the 
value of each ox and sheep? 

(b) There are three kinds of com. The grains contained in two, three, and four bundles, respectively, of 
these three classes of com, are not sufficient to make a whole measure. However, if we added to them 
one bundle of the second, third, and first classes, respectively, then the grains would become on full 
measure in each case. How many measures of grain does each bundle of the different classes contain? 


Answer: 


( a ) Ox, -r-7- units; sheep, - 77 - unit 

21 F 21 

(b) First kind, measure; second kind, 7 ^- measure; third kind, 7 ^=- measure 

5. This problem in part (a) is known as the “Flower of Thymaridas,” named after a Pythagorean of the fourth 
century B.C. 

(a) Given the n numbers a 1 , aj ,.... a n , solve for x\, xj, ...,x n in the following linear system: 

xi +x 2 + • • • + x» = ai 
x\+X2 = <32 

*l+*3 = a 3 


xi+x„ = a n 

(b) Identify a problem in this exercise set that fits the pattern in part (a), and solve it using your general 
solution. 


Answer: 


M x (a 2 + a 3 + ... + a„)-a i ,x l =a i -x l ,i = 2,3 . » 

n — 2 

(b) Exercise 7(b); gold, 30-^ minae; brass, 9^ minae; tin, 14^ minae; iron, 5^ minae 

Lrf £ La La 


6 . For Example 5 from the Bakhshali Manuscript: 

(a) Express Equations 5 as a homogeneous linear system of three equations in four unknowns (x, y, z, and 
K ) and show that the solution set has one arbitrary parameter. 

(b) Find the smallest solution for which all four variables are positive integers. 



(c) Show that the solution given in Example 5 is included among your solutions. 


Answer: 


(a) 5x-\-y+z — K = 0 

x + 7y +z — K = 0 

x q .y + 8 z — K = 0 


x — ^ j , y = -yjj-, z — , K — t where 1 is an arbitrary number 

(b) Take t = 131 5 so that ^ = 21, y = 14, z= 12, £” = 131 • 

(c) Take t = 262, so that * = 42, y = 28, z = 24, K = 262- 


7. Solve the problems posed in the following three epigrams, which appear in a collection entitled “The 
Greek Anthology,” compiled in part by a scholar named Metrodorus around A.D. 500. Some of its 46 
mathematical problems are believed to date as far back as 600 B.C. [Note: Before solving parts (a) and (c), 
you will have to formulate the question.] 

(a) I desire my two sons to receive the thousand staters of which I am possessed, but let the fifth part of 
the legitimate one's share exceed by ten the fourth part of what falls to the illegitimate one. 

(b) Make me a crown weighing sixty minae, mixing gold and brass, and with them tin and much-wrought 
iron. Let the gold and brass together form two-thirds, the gold and tin together three-fourths, and the 
gold and iron three-fifths. Tell me how much gold you must put in, how much brass, how much tin, 
and how much iron, so as to make the whole crown weigh sixty minae. 

(c) First person: I have what the second has and the third of what the third has. Second person: I have 
what the third has and the third of what the first has. Third person: And I have ten minae and the third 
of what the second has. 


Answer: 



(a) 


(b) 


(c) 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematical Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 





Tl. 

(a) Solve Archimedes' Cattle Problem using a symbolic algebra program. 

(b) The Cattle Problem has a second part in which two additional conditions are imposed. The first of these 
states that “When the white bulls mingled their number with the black, they stood firm, equal in depth and 
breadth.” This requires that W \ B be a square number, that is, 1, 4, 9, 16, 25, and so on. Show that this 
requires that the values of k in Eq. 4 be restricted as follows: 

k = A,A56,lA9r 2 , r= 1 , 2 , 3 ,... 

and find the smallest total number of cattle that satisfies this second condition. 

The second condition imposed in the second part of the Cattle Problem states that “When the 
yellow and the dappled bulls were gathered into one herd, they stood in such a manner that their number, 
beginning from one, grew slowly greater ’til it completed a triangular figure.” This requires that the quantity 
7+ Z) be a triangular number—that is, a number of the form 1,1+ 2, 1+2 + 3, 1 + 2 + 3 + 4,.... This 
final part of the problem was not completely solved until 1965 when all 206,545 digits of the smallest 
number of cattle that satisfies this condition were found using a computer. 

T2. The following problem is from “The Nine Chapters of the Mathematical Art” and determines a 
homogeneous linear system of five equations in six unknowns. Show that the system has infinitely many 
solutions, and find the one for which the depth of the well and the lengths of the five ropes are the smallest 
possible positive integers. 

Suppose that five families share a well. Suppose further that 

2 of A ? s ropes are short of the well’s depth by one of B’s ropes. 

3 of B’s ropes are short of the well’s depth by one of C’s ropes. 

4 of C’s ropes are short of the well’s depth by one of D's ropes. 

5 of D's ropes are short of the well's depth by one of E’s ropes. 

6 of E's ropes are short of the well’s depth by one of A's ropes. 
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10.4 Cubic Spline Interpolation 

In this section an artist’s drafting aid is used as a physical model for the mathematical problem of finding a curve that passes 
through specified points in the plane. The parameters of the curve are determined by solving a linear system of equations. 


Prerequisites 

Linear Systems 
Matrix Algebra 
Differential Calculus 


Curve Fitting 

Fitting a curve through specified points in the plane is a common problem encountered in analyzing experimental data, in 
ascertaining the relations among variables, and in design work. A ubiquitous application is in the design and description of 
computer and printer fonts, such as PostScript™ and TrueType™ fonts (Figure 10.4.1). In Figure 10.4.2 seven points in the 
xy-plane are displayed, and in Figure 10.4.4 a smooth curve has been drawn that passes through them. A curve that passes 
through a set of points in the plane is said to interpolate those points, and the curve is called an interpolating curve for those 
points. The interpolating curve in Figure 10.4.4 was drawn with the aid of a drafting spline (Figure 10.4.3). This drafting aid 
consists of a thin, flexible strip of wood or other material that is bent to pass through the points to be interpolated. Attached 
sliding weights hold the spline in position while the artist draws the interpolating curve. The drafting spline will serve as the 
physical model for a mathematical theory of interpolation that we will discuss in this section. 


Figure 10.4.1 




X 


Figure 10.4.2 




Figure 10.4.3 


Figure 10.4.4 


Statement of the Problem 

Suppose that we are given n points in the xy-plane, 

O1.71). (*2,72),—, 

which we wish to interpolate with a “well-behaved” curve (Figure 10.4.5). For convenience, we take the points to be equally 
spaced in the x-direction, although our results can easily be extended to the case of unequally spaced points. If we let the 
common distance between the x-coordinates of the points be h , then we have 

X2~*l =*3-*2= • • ' = x n~ x n- 1=A 

Let y = S(x), x < x n denote the interpolating curve that we seek. We assume that this curve describes the displacement of 
a drafting spline that interpolates the n points when the weights holding down the spline are situated precisely at the n points. It 
is known from linear beam theory that for small displacements, the fourth derivative of the displacement of a beam is zero along 
any interval of the x-axis that contains no external forces acting on the beam. If we treat our drafting spline as a thin beam and 
realize that the only external forces acting on it arise from the weights at the n specified points, then it follows that 

s (iv) 0) = o (i) 


for values of x lying in the ^ — 1 open intervals 

(• x \> x 2 ). ( x 2> x 2) . 


between the n points. 



We also need the result from linear beam theory that states that for a beam acted upon only by external forces, the displacement 
must have two continuous derivatives. In the case of the interpolating curve y — S(x) constructed by the drafting spline, this 
means that S(x),S\x), and S f, (x) must be continuous for x\ < x < x n . 

The condition that S rr (x) be continuous is what causes a drafting spline to produce a pleasing curve, as it results in continuous 






curvature. The eye can perceive sudden changes in curvature—that is, discontinuities in S fi (x) —but sudden changes in higher 
derivatives are not discernible. Thus, the condition that S r, (x) be continuous is the minimal prerequisite for the interpolating 
curve to be perceptible as a single smooth curve, rather than as a series of separate curves pieced together. 


To determine the mathematical form of the function Sf(^),we observe that because S^ v \x) = 0 in the intervals between the n 
specified points, it follows by integrating this equation four times that S(x) must be a cubic polynomial in x in each such 
interval. In general, however, S{x) will be a different cubic polynomial in each interval, so S(x) must have the form 

1 S\(x ), <x<*2 

CV/ \ , £>2(x), X2<X<X2 

£(*) = ( ^ i J ( 2 ) 

x„-\<x<x„ 

where S\ ( x ), S 2 (x), ( x ) are cubic polynomials. For convenience, we will write these in the form 

3 2 

^lC*) = ai(x — *i) — xj) +ci(x — xj) +^i, xi<x<*2 

S 2 OO = a2(x-X2) 3 -*-b2(x-X2) 2 + C2(x-X2)+d2, X2<x<X2 q) 


S n -l(x) = a„_i(x-x M _i ) 3 + £„_i(x-x„_i ) 2 + c„_i(x-x„_i) + 

The fly's, & 2 *'s, and dj's constitute a total of 4« — 4 coefficients that we must determine to specify S(x) completely. If we 
choose these coefficients so that S(x) interpolates the n specified points in the plane and S(x), S r (x), and S' ' (x) are 
continuous, then the resulting interpolating curve is called a cubic spline. 


Derivation of the Formula of a Cubic Spline 


From Equations 2 and 3, we have 

S(x) = £i(x)=< 3 i(x — xi) 3 -F&i(x— *i ) 2 + ci(x — *i) =Fd?i, x\<x<X2 

S(x) = S2(x)=ci2(x-X2 ) 3 + b2(x-X2) 2 +C2(x—X2)+d2, x 2 <*<*3 

S(x) = S„-i(x)=a„-i(x-x„-i) 3 + b„-i(x-x n -i) 2 + c„-i(x-x n -i)+d„-u x„-i<x<x„ 


(4) 


S'(x) = Sj (x) = 3a\(x — x\) 2 + 2b\{x — *i) +ci, 

S'{x) = SjOO =2a2(x-X2) 3 -*r2b2(x-X2) +C 2 , 

S\x) = S r ri _ l (x) = 3a n -i(x-x„-i) 2 + 2b n -i(x-x„-i)+c„-i, x„-i<x<x„ 

and 

S"(x) = S 1 w 00 = 6a 1 (x-xi)+26i, 

S"(x) = Spx) = 6a 2 (x-x 2 ) + 2b 2 , 

S"(x) = S"_ x (x) = 6a n -i(x-x»-i) + 26„_i, x„-i<x<x„ 

We will now use these equations and the four properties of cubic splines stated below to express the unknown coefficients flj, & 2 - 
, c 2? dj, i = 1, 2, n — 1, in terms of the known coordinates y\,y2, 

1. S(x) interpolates the points y j), i = 1, 2, n. 


x\ <x <X 2 
X 2 < x < *3 


x\<x <X 2 
X 2 < x < X 2 


Because S(x) interpolates the points (xy, 7 j) ? i = 1, 2,we have 




£(*i) =y\. £(* 2 ) = 72 , S(x») =y n 


(7) 


From the first n _ ] of these equations and 4, we obtain 

d\ = 71 
= 72 

dyi —1 = 7m —1 

From the last equation in 7, the last equation in 4, and the fact that x n — x n -\ = h, we obtain 

a„_iA 3 + 6„_iA 2 +c„_iA + ,i„_i =7„ 


( 8 ) 


(9) 


2. S(x) is continuous on [x\, x n ]. 

Because S(x) is continuous for x\ < x < x n , it follows that at each point x* in the set *2, *3, - *n-\ we must have 


S i . l (x i )=S i (x i ), * = 2,3.*-l 


( 10 ) 


Otherwise, the graphs of Sj—\ (x) and Sj(x) would not join together to form a continuous curve at . When we apply the 
interpolating property Sj(Xj) = yp it follows from 10 that = ypi = 2, 3— 1, or from 4 that 

3 2 

ajA +triA4-<^i = 72 

a2^ 3 +i>2^ 2 + C2*+^2 = 73 ( 

a n— 2 fo °^^n— 2 h ^ c n— 2 ^^^n —2 = 7m—1 

3, S'(x) continuous on [x\, x n ]. 


Because S*(x) is continuous for 7:1 < x < x n , it follows that 

■^00=^00. * = 2,3.»- 1 

or, from 5, 

3a\h? 4= 2b\k +c\ = C 2 

Sa^h* + 2&2A +C2 = C2 


3a n -2^ 2 


4- 26„_2A + c„_2 


C M —1 


4. S r/, (^) is continuous on [x\, X 2 ]. 


Because S^OO is continuous for < x < x n , it follows that 

S”_ i {x i )=Sl'{x i ), 

or, from 6, 

6a\h 4- 2b\ 
&ci2h 4- 2& 2 


i = 2, 3,» — 1 


= 2& 2 
= 2&3 


6a n —2h 4- 2Z?„_2 — 2b n —\ 


( 12 ) 


(13) 


Equations 8, 9, 11, 12, and 13 constitute a system of 4« — 6 linear equations in the 4« — 4 unknown coefficients a*, & 2 , ^ 2 , ca? 2 , 
i = 1, 2,— 1. Consequently, we need two more equations to determine these coefficients uniquely. Before obtaining these 
additional equations, however, we can simplify our existing system by expressing the unknowns a*, bj , Cj, and d 2 in terms of 


new unknown quantities 


Ml=S"(xi), M 2 = S"{x 2 ). = 


and the known quantities 


>yn 


For example, from 6 it follows that 


M i = 2b x 
M 2 = 2b 2 


1 — 2b n —\ 


so 


bx = ±M u b 2 = ±M 2 ,.., b n -\ = 


Moreover, we already know from 8 that 

*\=y\, d 2 =y 2 ,..„ d n -\ =y„-\ 

We leave it as an exercise for you to derive the expressions for the a 2 ’s and c 2 ’s in terms of the Mj s and y 2 f s. The final result is 
as follows: 

Cubic Spline Interpolation 

Given n points (X2>y2)> ---> withx I+ i — Xj = A, i = 1, 2 ,n — 1, the cubic spline 


o 9 

xi) xj) + ci(x — xi) + d\, 

S(x) = l a 2 (x -x 2 ) 3 + b 2 (x - x 2 ) 2 + c 2 (x -x 2 ) + d 2 . 


*2 ^ x f? *3 



that interpolates these points has coefficients given by 


a t =(M i+ i-M 2 )/6A 
b t = Mi 12 

Ci = Ol'+l —yd th-[ (M 1+1 + 2 Mdh 1 6] 
<*i =7i 


(14) 


for: = 1, 2. n — 1, where M, =S r '(xj), i = 1, 2. 


From this result, we see that the quantities Mj, M 2 , uniquely determine the cubic spline. To find these quantities, we 
substitute the expressions for a 2 , & 2 -, and c 2 given in 14 into 12. After some algebraic simplification, we obtain 


M\ +4M2 4- M 3 = 6 (yi - 2^2 + 73 ) 
M2 + 4M34-M4 = 6(y2 — 273 4-74) / 


( 15 ) 


2 *+* 4- M n — 2 — fyn-l ^yn) / A 


or, in matrix form, 















’1 

4 

1 

0 . 

. 0 

0 

0 

o’ 

m 2 


71 “272+73 

0 

1 

4 

1 . 

. 0 

0 

0 

0 

m 3 


72-273+74 

0 

0 

1 

4 .. 

.. 0 

0 

0 

0 

M a 

6 

73-274 + 75 

0 

0 

0 

0 .. 

.. 4 

1 

0 

0 

M »- 3 

h 2 

7m-4“27m-3+7m-2 

0 

0 

0 

0 .. 

.. 1 

4 

1 

0 

Myi— 2 


7m-3-27«-2+7m-1 

0 

0 

0 

0 .. 

.. 0 

1 

4 

1 

M„_i 


7m—2 — 27m —1 +7m 













This is a linear system of n — 2 equations for the n unknowns M\, M„. Thus, we still need two additional equations to 

determine M\, M 2 , M n uniquely. The reason for this is that there are infinitely many cubic splines that interpolate the 
given points, so we simply do not have enough conditions to determine a unique cubic spline passing through the points. We 
discuss below three possible ways of specifying the two additional conditions required to obtain a unique cubic spline through 
the points. (The exercises present two more.) They are summarized in Table 1. 

Table 1 











' ' 



Natural 

The second 

M, =0 


4 

1 

0 

... 0 

0 

0 




v, - 2v 2 + y 3 

Spline 

derivative of the 

M„ = 0 


1 

4 

1 

... o 

0 

0 


M, 


y 2 - 2y, + y 4 


spline is zero at the 










6 



endpoints. 



0 

0 

0 

... J 

4 

1 


K-2 


_.v»- 2-2 >v, +y„ 





_0 

0 

0 

... o 

1 

4 


K-i 



Parabolic 

The spline reduces 

M, =Af, 


5 

1 

0 

... 0 

0 

0 


' m 2 ' 


Vi - 2y 2 + y. 

Runout 

to a parabolic curve 



1 

4 

1 

... o 

0 

0 


Af, 


v 2 - 2v, + y 4 

Spline 

on the first and last 










6 


intervals. 



0 

0 

0 

... ] 

4 

1 


Mn-2 

"a 2 

_y«-2-2y„.i +y„ 





0 

0 

0 

... o 

1 

5 


- M "- 1 . 



Cubic 

The spline is a 

= 2A#,-Af, 


6 

0 

0 

... o 

0 

0 


m 2 


y, - 2y 2 + y } 

Runout 

single cubic curv e 

M n = 2Af, l _ I - M„_ 2 


1 

4 

1 

... o 

0 

0 


m 3 


y 2 -2y,+y 4 

Spline 

on the first two and 










6 


last two intervals. 



0 

0 

0 

... , 

4 

1 


m„_ 2 

" A 2 

_.v„-2 - 2y„.| +y„ 





0 

0 

0 

... o 

0 

6 


_M n _ 


















The Natural Spline 

The two simplest mathematical conditions we can impose are 

M 1 = M n = 0 

These conditions together with 15 result in an n x n linear system for M\, M 2 ,M„, which can be written in matrix form as 


’l 

0 

0 

0 .. 

.. 0 

0 

o' 

’ Ml 



0 

1 

4 

1 

0 .. 

.. 0 

0 

0 

m 2 


y 1 - 

- 272+73 

0 

1 

4 

1 .. 

.. 0 

0 

0 

m 3 

6 

72“ 

- 273+74 

: 

: 

: 

• 

• 

: 

s 

: 

h 2 


: 

0 

0 

0 

0 .. 

.. 1 

4 

1 

M M _i 


2 “ 

- 27m—1 +7m 

0 

0 

0 

0 .. 

.. 0 

0 

1 

M n 



0 


For numerical calculations it is more convenient to eliminate M 1 and M n from this system and write 








































"4 

1 

0 

0 .. 

.. 0 

0 

0" 

’ M 2 ' 


y \ -272+73 

1 

4 

1 

0 .. 

.. 0 

0 

0 

M 2 


y2- 273 

0 

1 

4 

1 .. 

.. 0 

0 

0 

m 4 

6 

>>3-2x4+75 

• 

s 

i 

s 

• 

: 

• 

: 

h 2 

: 

0 

0 

0 

0 .. 

.. 1 

4 

1 

M n —2 


7m-3-27m-2+7m-1 

0 

0 

0 

0 .. 

.. 0 

1 

4 

M n - 1 


7m-2“27«-1 +7h 


( 16 ) 


together with 


Ml= 0 

M» = 0 


(17) 


(18) 


Thus, the {n — 2) x {n — 2) linear system can be solved for the n _ 2 coefficients M 2 , M 3 ,..., M„_j, and Mj and M w are 
determined by 17 and 18. 

Physically, the natural spline results when the ends of a drafting spline extend freely beyond the interpolating points without 
constraint. The end portions of the spline outside the interpolating points will fall on straight line paths, causing S rr (x) to 
vanish at the endpoints x 1 and x n and resulting in the mathematical conditions M \ = M n = 0. 

The natural spline tends to flatten the interpolating curve at the endpoints, which may be undesirable. Of course, if it is required 
that S ,f (x) vanish at the endpoints, then the natural spline must be used. 


The Parabolic Runout Spline 

The two additional constraints imposed for this type of spline are 

M 1 = M2 


(19) 


M n = M„_ 1 ( 20 ) 

If we use the preceding two equations to eliminate M\ and M n from 15, we obtain the {n — 2) x {n — 2) linear system 


( 21 ) 


for M 2 , M 3 ,..., M n -\. Once these n — 2 values have been determined, M\ and M n are determined from 19 and 20 . 

From 14 we see that M\ = M 2 implies that ct\ = 0, and M n = M n -\ implies that ct n -\ = 0. Thus, from 3 there are no cubic 
terms in the formula for the spline over the end intervals [x\, * 2 ] and [x n —\, x n ]. Hence, as the name suggests, the parabolic 
runout spline reduces to a parabolic curve over these end intervals. 


’5 

1 

0 

0 .. 

.. 0 

0 

o' 

’ m 2 ' 


71-272+73 

1 

4 

1 

0 .. 

.. 0 

0 

0 

m 3 


72-273+74 

0 

1 

4 

1 .. 

.. 0 

0 

0 

m 4 

6 

73-274+75 

s 

s 

j 

: 

i 

i 

i 

i 

h 2 

: 

0 

0 

0 

0 .. 

.. 1 

4 
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M n - 2 


7«—3 — 27m—2 +7«—1 

0 

0 

0 

0 .. 

.. 0 

1 

5 

M n - 1 


7m—2 — 27m—1 +7m 


The Cubic Runout Spline 














For this type of spline, we impose the two additional conditions 


M\ = 2M 2 — M 3 


(22) 


M n — 2M n -\ — M n -2 


(23) 


Using these two equations to eliminate M\ and M„ from 15 results in the following {n — 2) x {n — 2) linear system for 
M 2 , M 3 ,M „_\ : 


'6 

0 

0 

0 .. 

.. 0 

0 

0" 

’ m 2 ' 


71-2J2+73 

1 

4 

1 

0 .. 

.. 0 

0 

0 

m 3 


72 - 273 + 74 

0 

1 

4 

1 .. 

.. 0 

0 

0 

m 4 

6 

73-274+75 

: 

: 

: 

: 

: 

: 

: 

s 

h 2 

s 

0 

0 

0 

0 .. 

. 1 

4 

1 

M„- 2 


7m-3-27«-2+7m-1 

0 

0 

0 

0 .. 

.. 0 

0 

6 

M n - 1 


7m-2-27m-1 +7h 


After we solve this linear system for M 2 , M 3 ,..., M„_we can use 22 and 23 to determine M\ and M n . 


If we rewrite 22 as 


M 2 — M 1 = M 3 — M 2 

it follows from 14 that 1 = ct 2 . Because S m (x) = 6a\ on [x\, x 2 ] and S"\x) = 6a 2 on [x 2 , x 2 ] , we see that S f,, (x) is 
constant over the entire interval [*i, * 3 ]. Consequently, S(x) consists of a single cubic curve over the interval [x\,x 2 ] rather 
than two different cubic curves pieced together at x 2 . [To see this, integrate S ffr (x) three times.] A similar analysis shows that 
S(x) consists of a single cubic curve over the last two intervals. 


Whereas the natural spline tends to produce an interpolating curve that is flat at the endpoints, the cubic runout spline has the 
opposite tendency: it produces a curve with pronounced curvature at the endpoints. If neither behavior is desired, the parabolic 
runout spline is a reasonable compromise. 

EXAMPLE 1 Using a Parabolic Runout Spline 


The density of water is well known to reach a maximum at a temperature slightly above freezing. Table 2, from 
the Handbook of Chemistry and Physics (CRC Press, 2009), gives the density of water in grams per cubic 
centimeter for five equally spaced temperatures from — 10°Ct°30°C- We will interpolate these five 
temperature-density measurements with a parabolic runout spline and attempt to find the maximum density of 
water in this range by finding the maximum value on this cubic spline. In the exercises we ask you to perform 
similar calculations using a natural spline and a cubic runout spline to interpolate the data points. 

Table 2 


Temperature (°C) 

Density (g/cnv*) 

-10 

.99815 

0 

.99987 

10 

.99973 

20 

.99823 

30 

.99567 


Set 











Then 


*1 = 

-10, 

y i 

=99815 

*2 = 

0 , 

72 

= .99987 

*3 = 

10, 

73 

=99973 

*4 = 

20, 

74 

=99823 

*5 = 

30, 

75 

=99567 


6bl-2^2+73] th 2 =-.0001116 
6 [72 — 2^3 +74 ]/A 2 =—.0000816 
6[y 3 - 2^4+75] /A 2 = - .0000636 


and the linear system 21 for the parabolic runout spline becomes 

'5 1 01T ^2 1 [- 0001116' 

14 1 M 3 = -.0000816 
0 1 5j m 4 [- 0000636 

Solving this system yields 

M 2 = - .00001973 
M 3 = - .00001293 
M 4 = - .00001013 


From 19 and 20, we have 

Mi = M 2 = - .00001973 
M 5 = M 4 = - .00001013 


Solving for the ay's, A,'s, c{ s, and elf's in 14, we obtain the following expression for the interpolating parabolic 
runout spline: 

-.00000987(x + 10) 2 4- 0002707(x 4- 10) + .99815, -10 <x < 0 

000000113(x — 0) 3 — 00000987(x — 0) 2 + .0000733(x — 0) +.99987, 0<x<10 

.000000047(x - 10) 3 -.00000647(x - 10) 2 - ,0000900(x - 10) 4- .99973, 10 < x < 20 

-.00000507(x - 20) 2 - 0002053(x - 20) + .99823, 20 <x < 30 

This spline is plotted in Figure 10.4.6. From that figure we see that the maximum is attained in the interval 
[0,10] . To find this maximum, we set S*(x) equal to zero in the interval [0, 10]: 

S'(x) = ,000000339x 2 - ,0000197x 4- .0000733 = 0 



To three significant digits the root of this quadratic in the interval [0, 10] is x = 3.99, and for this value of x, 
^(3.99) = 1.00001. Thus, according to our interpolated estimate, the maximum density of water is 
1.00001 gi cm attained at 3.99°C- This agrees well with the experimental maximum density of 

l.OOOOOg / cm attained at 3.98°C- (I n th e original metric system, the gram was defined as the mass of one cubic 
centimeter of water at its maximum density.) 
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Figure 10.4.6 


Closing Remarks 

In addition to producing excellent interpolating curves, cubic splines and their generalizations are useful for numerical 
integration and differentiation, for the numerical solution of differential and integral equations, and in optimization theory. 

Exercise Set 10.4 

1. Derive the expressions for a 2 and in Equations 14 of Theorem 10.4.1. 

2. The six points 

(0, .00000), (.2, .19867), (.4, .38942), 

(.6, .56464), (.8, .71736), (1.0, .84147) 

lie on the graph of y = sin x, where x is in radians. 

(a) Find the portion of the parabolic runout spline that interpolates these six points for .4 < x < .6. Maintain an accuracy of 
five decimal places in your calculations. 

(b) Calculate S(. 5) for the spline you found in part (a). What is the percentage error of £(.5) with respect to the “exact” 
value of sin(.5) = .47943? 

Answer: 

(a) S(x) = - . 12643(;t - 4) 3 - .2021 1 (* - A) 2 + 92158(;c - .4) + .38942 

(b) £(.5) = .47943; error = 0% 

3. The following five points 

(0,1), (1,7), (2, 27), (3,79), (4, 181) 

lie on a single cubic curve. 

(a) Which of the three types of cubic splines (natural, parabolic runout, or cubic runout) would agree exactly with the single 
cubic curve on which the five points lie? 

(b) Determine the cubic spline you chose in part (a), and verify that it is a single cubic curve that interpolates the five points. 
Answer: 


(a) The cubic runout spline 


(b) S(x) = 3x 3 - 2x 2 + 5x + 1 

4. Repeat the calculations in Example 1 using a natural spline to interpolate the five data points. 


Answer: 


- . 00000042 (x + 10) J 



.0002140 + 10) 


.99815, 

-10<x<0 

. 00000024 (x ) 3 

.00001260) 2 

+ 

.0000880) 

+ 

.99987, 

0<x< 10 

j- .000000040 —10) 3 - 

.00000540-10) 2 

- 

.0000920-10) 

+ 

.99973, 

10 <x < 20 

.000000220-20) 3 - 

.00000660-20) 2 

— 

.0002120-20) 

+ 

.99823, 

20 <x < 30 




Maximum at ( x , S(x)) = (3.93, 1.00004) 

5. Repeat the calculations in Example 1 using a cubic runout spline to interpolate the five data points. 
Answer: 


00000009O+ 10) 3 - 

.00001210+ 10) 2 

+ 

.0002820 + 10) 

+ 

.99815, 

— 10 < JT < 0 

00000009O) 3 

0000093O) 2 

4 = 

.000070 0) 


.99987, 

0<x< 10 

00000004O-10) 3 - 

.00000660-10) 2 

- 

.000087 0-10) 

+ 

.99973, 

10 <x < 20 

00000004O-20) 3 - 

.00000530 -20) 2 

— 

.0002070-20) 

+ 

.99823, 

20 <x < 30 




Maximum at ( x , S(x)) = (4.00, 1.00001) 

6. Consider the five points (0, 0), (.5, 1), (1, 0), (1.5, — 1), and (2, 0) on the graph of y = sin(ra)- 

(a) Use a natural spline to interpolate the data points (0, 0), (.5, 1), and (1, 0). 

(b) Use a natural spline to interpolate the data points (.5, 1), (1, 0), and (1.5, — 1). 

(c) Explain the unusual nature of your result in part (b). 


Answer: 


(a) 


S(x) = 


-4x 2 + 3x 
4x 2 -\2x 2 + 9x-\ 


0<x<0.5 
0.5<x< 1 


(b) 


S(x) = 


(2 —2x 
\2-2x 


0.5 <x < 1 
1 <x < 1.5 


(c) The three data points are collinear. 


7. (The Periodic Spline) If it is known or if it is desired that the n points (xj, y\), (*2, y 2), (*«, 7«) to be interpolated lie 

on a single cycle of a periodic curve with period x n — x\ 9 then an interpolating cubic spline S(x) must satisfy 

S(xO=S(x„) 

S'( Xl )=S\x„) 

S"(x l )=S"(x„) 


(a) Show that these three periodicity conditions require that 

y 1 = 

M X = M n 

4M\ + M 2 + M„_i = 6(>„_i -2^1 +^ 2 ) ! k 2 



(b) Using the three equations in part (a) and Equations 15, construct an {n — 1) x {n — 1) linear system for 
M\, M 2 ,M„_i in matrix form. 

Answer: 


4 

1 

0 

0 • • 

■ • 0 

0 

0 

f 

Ml 


yn- 1 

2 yi 

+ 

y 2 

1 

4 

1 

0 • • 

■ • 0 

0 

0 

0 

m 2 


y 1 

2 y 2 

+ 

73 

0 

1 

4 

1 • • 

■ • 0 

0 

0 

0 

m 3 

6 

72 

2 y 3 

+ 

y 4 

: 

: 

: 
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: 

: 

: 

: 

: 

h 2 


: 



0 

0 

0 

0 • ■ 

■ • 0 

1 

4 

1 

2 


yn- 3 

- 2 y „- 2 

+ 

yn- 1 

1 

0 

0 

0 • ■ 

■ • 0 

0 

1 

4 

M n —\ 


y n —2 

- 

+ 

71 


8. (The Clamped Spline) Suppose that, in addition to the n points to be interpolated, we are given specific values yj and y r n for 
the slopes S r {x\) and S\x n ) of the interpolating cubic spline at the endpoints x \ and x n . 

(a) Show that 

2M1 + M2 = 6O2-71 -hy[) ih 2 

2 M„ + M„_i = 6(>„_i —y n -¥hy' n ) Ih 2 

(b) Using the equations in part (a) and Equations 15, construct an ^ x n linear system for M\, M 2 , M n in matrix form. 

The clamped spline described in this exercise is the most accurate type of spline for interpolation work if the 
slopes at the endpoints are known or can be estimated. 

Answer: 


'2 

1 

0 

0 • • 

■ • 0 

0 

0 

r 

Mi 



71 

+ 

72 

1 

4 

1 

0 • ■ 

■ • 0 

0 

0 

0 

m 2 


71 

2 y 2 

+ 

73 

0 

1 

4 

1 • ■ 

■ • 0 

0 

0 

0 

Mi 

6 

72 

2 y 3 


74 

0 

0 

0 

0 • ■ 

■ • 0 

0 

4 

1 

Mn -1 

h 2 

7w—2 

“ 27m—1 


7 m 

0 

0 

0 

0 • ■ 

■ • 0 

1 

1 

2 

Mn 


7m-1 

7 m 

+ 

^ 7 m 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica , 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


Tl. In the solution of the natural cubic spline problem, it is necessary to solve a system of equations having coefficient matrix 


’4 

1 

0 . 

.. 0 

0 

0" 

1 

4 

1 . 

.. 0 

0 

0 

0 

0 

0 . 

.. 1 

4 

1 

0 

0 

0 . 

.. 0 

1 

4 


If we can present a formula for the inverse of this matrix, then the solution for the natural cubic spline problem can be easily 
obtained. In this exercise and the next, we use a computer to discover this formula. Toward this end, we first determine an 



















expression for the determinant of denoted by the symbol D n . Given that 

Ai = [4] and A 2 = * ] 

D\ =det(^i) = det[4] =4 

£>2 = det(j42) = det 


we see that 

and 


4 1 
1 4 


= 15 


(a) Use the cofactor expansion of determinants to show that 

Ai = 4D } i - £>„_ 2 

for n = 3, 4, 5 .This says, for example, that 

D 3 = 4D 2 -D x =4(15) —4 = 56 
D a = 4 D 3 - Z) 2 = 4(56) - 15 = 209 

and so on. Using a computer, check this result for 5 < n < 10. 

(b) By writing 

Ai = 4 A,-1 “ Ai -2 

and the identity, D n -\ = D n -\, in matrix form, 


show that 


(c) Use the methods in Section 5.2 and a computer to show that 

n—1 


' A. ' 


'4 

-r 

p-ii 

—1 


1 

0_ 

1 - 

to 

a 

1 

to 


' A. ' 


'4 

-r 

n—2 

'd 2 


'4 

-l' 

n—2 

'15' 

—1 


1 

0_ 


_A_ 


1 

0 


_ 4_ 


4 -1 
1 0 


-in-2 


(2 + / 3 )“ I -( 2 -/ 3 ) n 1 ( 2 -/ 3 ) n '-(2 + / 3 ) 


n—2 


n—2 


(2 + / 3 )“ 2 — (2 — ^ 3 ) n '-(2 + ^ 3 ) 


. n—2 


n —3 


n —3 


and hence 


: 


2/3 

(2 + ,/3)" +1 -(2-^)" +1 


2/3 


for « = 1, 2, 3,.... 

(d) Using a computer, check this result for 1 < n < 10. 


T2. In this exercise, we determine a formula for calculating from D^ for k = 0, 1, 2, 3,«, assuming that Dq is defined 

to be 1. 

(a) Use a computer to compute A^ for £ = 1,2, 3, 4, and 5. 

(b) From your results in part (a), discover the conjecture that 

where m ij = a ji and 


4T* = [«y] 


a g =(-l) 


Dyi 


for i <j. 

(c) Use the result in part (b) to compute A^ and compare it to the result obtained using the computer. 
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10.5 Markov Chains 

In this section we describe a general model of a system that changes from state to state. We then apply the model 
to several concrete problems. 


Prerequisites 

Linear Systems 
Matrices 

Intuitive Understanding of Limits 


A Markov Process 

Suppose a physical or mathematical system undergoes a process of change such that at any moment it can occupy 
one of a finite number of states. For example, the weather in a certain city could be in one of three possible 
states: sunny, cloudy, or rainy. Or an individual could be in one of four possible emotional states: happy, sad, 
angry, or apprehensive. Suppose that such a system changes with time from one state to another and at scheduled 
times the state of the system is observed. If the state of the system at any observation cannot be predicted with 
certainty, but the probability that a given state occurs can be predicted by just knowing the state of the system at 
the preceding observation, then the process of change is called a Markov chain or Markov process. 


DEFINITION 1 

If a Markov chain has k possible states, which we label as 1, 2then the probability that the system 
is in state i at any observation after it was in state j at the preceding observation is denoted by Pij and is 
called the transition probability from state j to state i. The matrix P — [Pij] is called the transition 
matrix of the Markov chain. 


For example, in a three-state Markov chain, the transition matrix has the form 

Preceding State 


1 

2 

3 

p\\ 

P12 

P 13 

P2\ 

P22 

P22 

P2\ 

P32 

P22 


1 

2 New State 

3 


In this matrix, P22 is the probability that the system will change from state 2 to state 3, P\\ is the probability that 
the system will still be in state 1 if it was previously in state 1, and so forth. 




EXAMPLE 1 Transition Matrix of the Markov Chain 


A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may rent a car 
from any of the three locations and return the car to any of the three locations. The manager finds 
that customers return the cars to the various locations according to the following probabilities: 

Rented from Location 


1 2 3 

.8 .3 .2 
.1 .2 .6 
.1 .5 .2 


1 Returned 

2 to 

3 Location 


This matrix is the transition matrix of the system considered as a Markov chain. From this matrix, 
the probability is 6 that a car rented from location 3 will be returned to location 2, the probability 
is . 8 that a car rented from location 1 will be returned to location 1, and so forth. 


EXAMPLE 2 Transition Matrix of the Markov Chain 

By reviewing its donation records, the alumni office of a college finds that 80% of its alumni who 
contribute to the annual fund one year will also contribute the next year, and 30% of those who do 
not contribute one year will contribute the next. This can be viewed as a Markov chain with two 
states: state 1 corresponds to an alumnus giving a donation in any one year, and state 2 corresponds 
to the alumnus not giving a donation in that year. The transition matrix is 


In the examples above, the transition matrices of the Markov chains have the property that the entries in any 
column sum to 1. This is not accidental. If P = [Pij] is the transition matrix of any Markov chain with k states, 
then for each j we must have 


Plj + P2j + -- + Pkj = 1 


( 1 ) 


because if the system is in state j at one observation, it is certain to be in one of the k possible states at the next 
observation. 

A matrix with property 1 is called a stochastic matrix , a probability matrix , or a Markov matrix . From the 
preceding discussion, it follows that the transition matrix for a Markov chain must be a stochastic matrix. 

In a Markov chain, the state of the system at any observation time cannot generally be determined with certainty. 
The best one can usually do is specify probabilities for each of the possible states. For example, in a Markov 
chain with three states, we might describe the possible state of the system at some observation time by a column 
vector 






X = 


*1 
*2 
*3_ 

in which x i is the probability that the system is in state 1, *2 the probability that it is in state 2, and *3 the 
probability that it is in state 3. In general we make the following definition. 


DEFINITION 2 

The state vector for an observation of a Markov chain with k states is a column vector x whose zth 
component x 2 is the probability that the system is in the ith state at that time. 


L J 

Observe that the entries in any state vector for a Markov chain are nonnegative and have a sum of 1. (Why?) A 
column vector that has this property is called a probability vector. 


Let us suppose now that we know the state vector x ' IJ1 for a Markov chain at some initial observation. The 
following theorem will enable us to determine the state vectors 

M 




-.X s 


at the subsequent observation times. 


THEOREM 10.5.1 

If P is the transition matrix of a Markov chain and x ( w ) is the state vector at the wth observation, then 

x (»+i) =jPx («). 


The proof of this theorem involves ideas from probability theory and will not be given here. From this theorem, 
it follows that 

x a) =Px (CD 
y m =Px (x )=p 2 y m 
x V) = Px V) = P \V) 

x («) = jPx (»-i) = jP » x (0) 

In this way, the initial state vector x ' 111 and the transition matrix P determine for n = 1, 2, .... 

EXAMPLE 3 Example 2 Revisited 

The transition matrix in Example 2 was 


We now construct the probable future donation record of a new graduate who did not give a donation in th< 






initial year after graduation. For such a graduate the system is initially in state 2 with certainty, so the initia 
vector is 


x 


©- 


0 

1 


From Theorem 10.5.1 we then have 

X® 

X® 

x® 




8 .3' 

'O' 


.3' 


2 7 . 

1 


_.7_ 


8 .3' 

'.3' 


'.45' 

2 .7 

.7 


.55 


8 

.3' 

'.45' 


'.525' 

2 

.7 

.55 


.475 _ 


Thus, after three years the alumnus can be expected to make a donation with probability .525. Beyond thre 
years, we find the following state vectors (to three decimal places): 


x^ = 

'.563' 

,438_ 


'.581' 
4 1 9 _ 

x© = 

'.591' 

409_ 

x^ = 

x^ = 

'.598' 

.402 


'.599' 

.401 

x™ = 

'.599' 

.401 

x^=' 


For all n beyond 11, we have 

.600" 

.400_ 

to three decimal places. In other words, the state vectors converge to a fixed vector as the number of 
observations increases. (We will discuss this further below.) 



EXAMPLE 4 Example 1 Revisited 


The transition matrix in Example 1 was 

.8 .3 .2 
.1 .2 .6 
.1 .5 .2 


If a car is rented initially from location 2, then the initial state vector is 


x 


<P>- 


0 

1 

0 


Using this vector and Theorem 10.5.1, one obtains the later state vectors listed in Table 1. 


Table 1 










































\ M 

0 

I 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 


0 

.300 

.400 

.477 

.511 

.533 

.544 

.550 

.553 

.555 

.556 

.557 

v («) 

*2 

1 

.200 

.370 


.261 

.240 

.238 

.233 

.232 

.231 

.230 

.230 

v (»l) 

A 3 

0 

.500 

.230 

.271 

.228 

->07 

.219 

.217 

.215 

.214 

.214 

.213 


For all values of tt greater than 11, all state vectors are equal to x ' 1 0 to three decimal places. 

Two things should be observed in this example. First, it was not necessary to know how long a customer k< 
the car. That is, in a Markov process the time period between observations need not be regular. Second, the 
state vectors approach a fixed vector as n increases, just as in the first example. 


EXAMPLES Using Theorem 10.5.1 


A traffic officer is assigned to control the traffic at the eight intersections indicated in Figure 10.5.1. 
She is instructed to remain at each intersection for an hour and then to either remain at the same 
intersection or move to a neighboring intersection. To avoid establishing a pattern, she is told to 
choose her new intersection on a random basis, with each possible choice equally likely. For example, 
if she is at intersection 5, her next intersection can be 2, 4, 5, or 8, each with probability -jj-. Every day 

she starts at the location where she stopped the day before. The transition matrix for this Markov chain 
is 

Old Intersection 
2 3 4 5 6 7 


1 


I I 

3 3 

I I 

3 3 


8 


0 

D 

0 0 


4 0 0 


1 

4 


0 

0 


I 1 
3 5 


4 4 0 4 


0 0 
0 0 0 
0 0 


4 ± ± o 4 o 


0 

0 0 


III 

3 5 4 

1 1 

5 


± 0 - - 0 0 


4 
0 0 




0 0 0 

D 

0 0 0 0 


4- 0 - - - 

"343 

1 1 

4 


4 o 4 1 


1 

2 

3 

4 

5 

6 

7 

8 


New 

Intersection 


















_J L 



if ii ir 


Figure 10.5.1 

If the traffic officer begins at intersection 5, her probable locations, hour by hour, are given by the 
state vectors given in Table 2. For all values of n greater than 22, all state vectors are equal to 1 to 
three decimal places. Thus, as with the first two examples, the state vectors approach a fixed vector as 
n increases. 


Table 2 


\ n 

0 

1 

2 

3 

4 

5 

10 

15 

20 

22 


0 

.000 

.133 

.116 

.130 

.123 

.113 

.109 

.108 

.107 


0 

.250 

.146 

.163 

.140 

.138 

.115 

.109 

.108 

.107 


0 

.000 

.050 

.039 

.067 

.073 

.100 

.106 

.107 

.107 


0 

.250 

.113 

.187 

.162 

.178 

.178 

.179 

.179 

.179 

X? 

1 

.250 

.279 

.190 

.190 

.168 

.149 

.144 

.143 

.143 


0 

.000 

.000 

.050 

.056 

.074 

.099 

.105 

.107 

.107 

A” 

0 

.000 

.133 

.104 

.131 

.125 

.138 

.142 

.143 

.143 


0 

.250 

.146 

.152 

.124 

.121 

.108 

.107 

.107 

.107 


Limiting Behavior of the State Vectors 

In our examples we saw that the state vectors approached some fixed vector as the number of observations 
increased. We now ask whether the state vectors always approach a fixed vector in a Markov chain. A simple 
example shows that this is not the case. 

EXAMPLE 6 System Oscillates Between Two State Vectors 

Let 


P = 

'o r 

and 

x® = 

"f 


1 0 



0 


Then, because p^ — l and p^ — p, we have that 




































and 


x ©= x © = x«> = ...= 


x( 1 ) = x^ = x^ = ...= 


This system oscillates indefinitely between the two state vectors 
approach any fixed vector. 


and 


, so it does not 


However, if we impose a mild condition on the transition matrix, we can show that a fixed limiting state vector is 
approached. This condition is described by the following definition. 


DEFINITION 3 

A transition matrix is regular if some integer power of it has all positive entries. 

L J 

Thus, for a regular transition matrix P, there is some positive integer m such that all entries of P m are positive. 
This is the case with the transition matrices of Examples 1 and 2 for m — \. In Example 5 it turns out that p 4 has 
all positive entries. Consequently, in all three examples the transition matrices are regular. 

A Markov chain that is governed by a regular transition matrix is called a regular Markov chain. We will see 
that every regular Markov chain has a fixed state vector q such that P M x (U 1 approaches q as n increases for any 
choice of x (°). This result is of major importance in the theory of Markov chains. It is based on the following 
theorem. 


THEOREM 10.5.2 Behavior of P" as 


P " 


n —» 

OQ 


OCb 



’?l 

q\ 

... q\ 

<n 

?2 

— $2 

Qk 

<lk 

— <7 k 

+ <?2 + • 

•• + 4 'k z 


We will not prove this theorem here. We refer you to a more specialized text, such as J. Kemeny and J. Snell, 
Finite Markov Chains (New York: Springer-Verlag, 1976). 


Let us set 












‘<71 

<71 — 

<?f 



-<?r 

<72 

<72 — 

<72 

and 

q = 

<72 

<?fc 

<7Jc 

<7 k 



<7 k 


Thus, Q is a transition matrix, all of whose columns are equal to the probability vector q. Q has the property that 
if x is any probability vector, then 



'<71 

<71 

— <?l‘ 

■*r 


’$1*1 

+ 

<71*2 


<71 *Jc" 

Qx = 

<72 

<72 

— <72 

*2 

= 

<72* 1 

+ 

<72*2 

+--- + 

<72 *fc 


<7 k 

<7 k 

— <7* 



<?fc*l 

+ 

<7**2 

+... + 

<2k x k 


= (;q +X2+--- + **) 


<71 

<72 


= (l)q=q 


<7 k 


That is, Q transforms any probability vector x into the fixed probability vector q. This result leads to the 
following theorem. 


Behavior of P°x as n - oo 

If P is a regular transition matrix and x is any probability vector, then as ^ _► qq, 


< 1 \ 





where q is a fixed probability vector, independent of n , all of whose entries are positive. 


This result holds since Theorem 10.5.2 implies that P n —► Q as ^ _► oo- This in turn implies that P n x —► Qx = q 
as n —► oo- Thus, for a regular Markov chain, the system eventually approaches a fixed state vector q. The vector 
q is called the steady-state vector of the regular Markov chain. 

For systems with many states, usually the most efficient technique of computing the steady-state vector q is 
simply to calculate P”x for some large n. Our examples illustrate this procedure. Each is a regular Markov 
process, so that convergence to a steady-state vector is ensured. Another way of computing the steady-state 
vector is to make use of the following theorem. 


Steady-State Vector 

The steady-state vector q of a regular transition matrix P is the unique probability vector that satisfies the 
equation Pq = q. 
















To see this, consider the matrix identity pp n = p n +^. By Theorem 10.5.2, both P n and approach Q as 
n _+ oo* Thus, we have PQ = Q. Any one column of this matrix equation gives Pq = q. To show that q is the 
only probability vector that satisfies this equation, suppose r is another probability vector such that Pr = r* Then 
also P n r = r for n = 1, 2,.... When we let n _► qq, Theorem 10.5.3 leads to q = r. 


Theorem 10.5.4 can also be expressed by the statement that the homogeneous linear system 

(/-P)q = 0 

has a unique solution vector q with nonnegative entries that satisfy the condition q \ + <2 f 2+--* + ‘?fc = 1- We can 
apply this technique to the computation of the steady-state vectors for our examples. 

EXAMPLE 7 Example 2 Revisited 

In Example 2 the transition matrix was 


P = 


.8 .3 
.2 .7 


so the linear system (/ — P) q = 0 is 


.2 -.3' 

'q\ 


'O' 

1 

on 

Cs] 

i 

_i 

<?2 


_0_ 


( 2 ) 


This leads to the single independent equation 

2r?l - .3^2 = 0 

or 

?1 = 1 5<?2 

Thus, when we set 2 = s, any solution of 2 is of the form 

1.5' 


q = S 


1 


where s is an arbitrary constant. To make the vector q a probability vector, we set 
s= 1 / (1.5 + 1) = 4. Consequently, 

r.6 

q= U 

is the steady-state vector of this regular Markov chain. This means that over the long run, 60% of 
the alumni will give a donation in any one year, and 40% will not. Observe that this agrees with the 
result obtained numerically in Example 3. 


EXAMPLE 8 Example 1 Revisited 


In Example 1 the transition matrix was 



.3 

.2 

.5 


.2 

.6 

.2 


so the linear system (/ — P) q = 0 is 
















.2 -.3 -.2" 

vr 


"o' 

1 

CO 

1 

O'! 

<n 

= 

0 

-1 

00 

in 

r 

r 

_i 

?3 


0 


The reduced row echelon form of the coefficient matrix is (verify) 


1 0 

0 1 
0 0 


34 

13 

14 
13 

0 


so the original linear system is equivalent to the system 



When we set £3 = s , any solution of the linear system is of the form 


q = s 


34 

13 

14 
13 
1 


To make this a probability vector, we set 


s = 


1 


H + .M+1 

13 13 


13 

61 


Thus, the steady-state vector of the system is 


q = 


34 

61 

li 

61 

II 

61 


.5573... 

.2295... 

.2131... 


This agrees with the result obtained numerically in Table 1. The entries of q give the long-run 
probabilities that any one car will be returned to location 1, 2, or 3, respectively. If the car rental 
agency has a fleet of 1000 cars, it should design its facilities so that there are at least 558 spaces at 
location 1, at least 230 spaces at location 2, and at least 214 spaces at location 3. 


EXAMPLE 9 Example 5 Revisited 

We will not give the details of the calculations but simply state that the unique probability vector 
solution of the linear system (/ — P) q = 0 is 

















_3_ 

28 

_3_ 

28 

_3_ 

28 

_5_ 

28 

_4_ 

28 

_ 3 _ 

28 

_4_ 

28 

J_ 

28 


1071... 

1071... 

1071.. . 

1785.. . 

1428.. . 

1071.. . 

1428.. . 

1071.. . 


The entries in this vector indicate the proportion of time the traffic officer spends at each 
intersection over the long term. Thus, if the objective is for her to spend the same proportion of 
time at each intersection, then the strategy of random movement with equal probabilities from one 
intersection to another is not a good one. (See Exercise 5.) 


Exercise Set 10.5 

1. Consider the transition matrix 


P = 


A .5 
.6 .5 


^ Calculate x (”) for n = 1, 2, 3, 4, 5 if*® = 

(b) State why P is regular and find its steady-state vector. 
Answer: 


(a) JX> _ 

\4' 

, x© = 

'.46' 

, x^ = 

'.454' 

, x^ = 

'.4546' 

. x^ = 

'.45454' 


.6 


.54 


.546 


.5454 


.54546 


(b) 


P is regular since all entries of P are positive; q = 


JL 

11 

_ 6 _ 

11 


2. Consider the transition matrix 


P = 


.2 .1 .7 
.6 .4 .2 
.2 .5 .1 
























(a) Calculate x 'dX x® - and x' 3 ^ 10 three decimal places if 

roi 



(b) State why P is regular and find its steady-state vector. 



(a) [_9_" 

17 

_ 8 _ 

17 

(b) [26" 

45 

19 

45 



(C) |_3_ 
19 

J_ 

19 

12 

19 


4. Let P be the transition matrix 


(a) Show that P is not regular. 


(b) 


Show that as n increases, approaches 


for any initial state vector x i:U l . 


(c) What conclusion of Theorem 10.5.3 is not valid for the steady state of this transition matrix? 


Answer: 


(a) 


P” = 


&r 

-&r 


, n = 1, 2, — Thus, no integer power of P has all positive entries. 


(b) pn 

(c) 


0 0 
1 1 


as n increases, so 




The entries of the limiting vector 


for any x © as n increases, 
are not all positive. 


5. Verify that if P is a x k regular transition matrix all of whose row sums are equal to 1, then the entries of its 
steady-state vector are all equal to ] / £. 

6. Show that the transition matrix 


P = 


0 I I 
2 2 


1 I 0 

2 2 


■k 0 ± 


is regular, and use Exercise 5 to find its steady-state vector. 


Answer: 



t 1 r 


' 1' 

2 4 4 


3 

1 1 1 

4 2 4 

has all positive entries; q = 

1 

3 

1 1 1 


1 

4 4 2 


3 


7. John is either happy or sad. If he is happy one day, then he is happy the next day four times out of five. If he 
is sad one day, then he is sad the next day one time out of three. Over the long term, what are the chances that 
John is happy on any given day? 

Answer: 

10 

13 

8. A country is divided into three demographic regions. It is found that each year 5% of the residents of region 1 
move to region 2, and 5% move to region 3. Of the residents of region 2, 15% move to region 1 and 10% 
move to region 3. And of the residents of region 3, 10% move to region 1 and 5% move to region 2. What 
percentage of the population resides in each of the three regions after a long period of time? 

Answer: 

12 1 
54 - 7 % in region 1, 16 - 7 % in region 2, and 2 9- 7 % in region 3 
6 3 6 

Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematical Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Consider the sequence of transition matrices 

{p 2 ,p 2 ,p a ,...) 


with 






Pi 


0 

1 


1 

2 

1 

2 


P 4 


0 

0 

0 

1 


0 0 


1 1 

2 3 

1 1 

2 3 


I 

4 

1 

4 

I 

4 

1 

4 


^3 


^5 


0 

0 

1 


0 

1 

2 

1 

2 


I 

3 

I 

3 

I 

3 


0 0 0 0 

0 0 0 } J 

4 5 

I I 1 

3 4 

1 1 1 


0 0 ± 4 j 

I 

5 

1 

5 


0 -L -i. JL -i. 


1 4 


2 3 4 

1 1 I 

3 4 


and so on. 

(a) Use a computer to show that each of these four matrices is regular by computing their squares. 

(b) Verify Theorem 10.5.2 by computing the 100th power of P^ for k = 2, 3, 4, 5. Then make a conjecture as to 
the limiting value of PjJ? as n oo f° r all k = 2, 3, 4,... . 

(c) Verify that the common column q* of the limiting matrix you found in part (b) satisfies the equation 
Pk = q^, as required by Theorem 10.5.4. 


T2. A mouse is placed in a box with nine rooms as shown in the accompanying figure. Assume that it is equally 
likely that the mouse goes through any door in the room or stays in the room. 

(a) Construct the 9 x 9 transition matrix for this problem and show that it is regular. 

(b) Determine the steady-state vector for the matrix. 

(c) Use a symmetry argument to show that this problem may be solved using only a 3 x 3 matrix. 



Figure Ex-T2 
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10.6 Graph Theory 

In this section we introduce matrix representations of relations among members of a set. We use matrix 
arithmetic to analyze these relationships. 


Prerequisites 

Matrix Addition and Multiplication 


Relations Among Members of a Set 

There are countless examples of sets with finitely many members in which some relation exists among 
members of the set. For example, the set could consist of a collection of people, animals, countries, 
companies, sports teams, or cities; and the relation between two members, A and B , of such a set could be that 
person A dominates person B , animal A feeds on animal B , country A militarily supports country B , company 
A sells its product to company B , sports team A consistently beats sports team B , or city A has a direct airline 
flight to city B. 

We will now show how the theory of directed graphs can be used to mathematically model relations such as 
those in the preceding examples. 


Directed Graphs 

A directed graph is a finite set of elements, (Pi, Pj,..., P n ) , together with a finite collection of ordered 
pairs (Pi, Pj) of distinct elements of this set, with no ordered pair being repeated. The elements of the set are 
called vertices, and the ordered pairs are called directed edges, of the directed graph. We use the notation 
Pi —► Pj (which is read “P,- is connected to Pj”) to indicate that the directed edge (Pi, Pj) belongs to the 
directed graph. Geometrically, we can visualize a directed graph (Figure 10.6.1) by representing the vertices 
as points in the plane and representing the directed edge Pi — » Pj by drawing a line or arc from vertex P, to 
vertex Pj, with an arrow pointing from P l to Pj . If both P } —► Pj and Pj —► P, hold (denoted Pj Pj) , we 
draw a single line between P i and Pj with two oppositely pointing arrows (as with Pj and P 3 in the figure). 

P 2 

Pi 


P5 

Pa 


Figure 10.6.1 


As in Figure 10.6.1, for example, a directed graph may have separate “components” of vertices that are 
connected only among themselves; and some vertices, such as P$, may not be connected with any other 
vertex. Also, because P 2 —► Pj is not permitted in a directed graph, a vertex cannot be connected with itself by 
a single arc that does not pass through any other vertex. 

Figure 10.6.2 shows diagrams representing three more examples of directed graphs. With a directed graph 
having n vertices, we may associate an n x n matrix M = ] , called the vertex matrix of the directed 

graph. Its elements are defined by 

i, ePi^Pj 

0, otherwise 

for i, j = 1,2,..., n. For the three directed graphs in Figure 10.6.2, the corresponding vertex matrices are 

0 10 0 
0 0 10 
0 10 1 
0 0 0 0 

"0 1 0 0 f 
0 0 110 

Figure b: M = 0 0 0 1 0 

0 10 0 1 

0 110 0 

0 10 0 
10 10 
10 0 1 
10 0 0 





Figure c: 
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Pi 


Pi 


(«) 


Pa 


Pi 


Pi Pa 

P^ 

Pi 

(b) 


P, 


Pi 

(c) 

Figure 10.6.2 


By their definition, vertex matrices have the following two properties: 

(i) All entries are either 0 or 1. 

All diagonal entries are 0. 

Conversely, any matrix with these two properties determines a unique directed graph having the given matrix 
as its vertex matrix. For example, the matrix 


M = 


0 

0 

1 

0 


1 1 
0 1 
0 0 
0 0 


0 

0 

1 

0 


determines the directed graph in Figure 10.6.3. 




Figure 10.6.3 


EXAMPLE 1 Influences Within a Family 


A certain family consists of a mother, father, daughter, and two sons. The family members have 
influence, or power, over each other in the following ways: the mother can influence the 
daughter and the oldest son; the father can influence the two sons; the daughter can influence 
the father; the oldest son can influence the youngest son; and the youngest son can influence the 
mother. We may model this family influence pattern with a directed graph whose vertices are 
the five family members. If family member A influences family member B, we write A-*B- 
Figure 10.6.4 is the resulting directed graph, where we have used obvious letter designations for 
the five family members. The vertex matrix of this directed graph is 


M 

F 

D 

OS 

YS 


MFDOSYS 

0 0 110 
0 0 0 1 1 

0 10 0 0 

0 0 0 0 1 

1 0 0 0 0 


M 


YS 


os 


D 


F 


Figure 10.6.4 


EXAMPLE 2 Vertex Matrix: Moves on a Chessboard 

In chess the knight moves in an “L”-shaped pattern about the chessboard. For the board in 
Figure 10.6.5 it may move horizontally two squares and then vertically one square, or it may 
move vertically two squares and then horizontally one square. Thus, from the center square in 
the figure, the knight may move to any of the eight marked shaded squares. Suppose that the 
knight is restricted to the nine numbered squares in Figure 10.6.6. If by i —► j we mean that the 
knight may move from square i to squarey, the directed graph in Figure 10.6.7 illustrates all 




possible moves that the knight may make among these nine squares. In Figure 10.6.8 we have 
“unraveled” Figure 10.6.7 to make the pattern of possible moves clearer. 


The vertex matrix of this directed graph 

is 

given by 






0 

0 

0 

0 

0 

1 

0 

1 

0 


0 

0 

0 

0 

0 

0 

1 

0 

1 


0 

0 

0 

1 

0 

0 

0 

1 

0 


0 

0 

1 

0 

0 

0 

0 

0 

1 

M = 

0 

0 

0 

0 

0 

0 

0 

0 

0 


1 

0 

0 

0 

0 

0 

1 

0 

0 


0 

1 

0 

0 

0 

1 

0 

0 

0 


1 

0 

1 

0 

0 

0 

0 

0 

0 


0 

1 

0 

1 

0 

0 

0 

0 

0 


Figure 10.6.5 
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Figure 10.6.8 

In Example 1 the father cannot directly influence the mother; that is , F —* M is not true. But he can influence 
the youngest son, who can then influence the mother. We write this as p _► YS —► M and call it a 2-step 
connection from F to M. Analogously, we call M D a 1-step connection , p _► _► _► a 3-step 

connection , and so forth. Let us now consider a technique for finding the number of all possible r-step 
connections (r = 1, 2,...) from one vertex P 2 to another vertex Pj of an arbitrary directed graph. (This will 
include the case when P 2 and Pj are the same vertex.) The number of 1-step connections from P 2 to Pj is 

simply m ij. That is, there is either zero or one l-step connection from P 2 to Pj, depending on whether m ij is 

(2) 

zero or one. For the number of 2-step connections, we consider the square of the vertex matrix. If we let ^ / 
be the (i, y)-th element of we have 

( 2 ) 

m i j =mi\m\j + m^^j +--- + w in m n j ( 1 ) 

Now, if = m\j = 1, there is a 2-step connection P\ —► P\ —► Pj from P 2 to Pj. But if either or m \j is 
zero, such a 2-step connection is not possible. Thus P\ —> P i —► Pj is a 2-step connection if and only if 
Wj\m\ j = 1. Similarly, for any k = 1, 2, n, P\ —► Pfr —► Pj is a 2-step connection from P 2 to Pj if and 
only if the term m ik m kj on the right side of 1 is one; otherwise, the term is zero. Thus, the right side ofl is 
the total number of two 2-step connections from P 2 to Pj. 

A similar argument will work for finding the number of 3 — , 4 — ,r-step connections from P 2 to Pj. In 
general, we have the following result. 


THEOREM 10.6.1 

Let M be the vertex matrix of a directed graph and let w |y J be the (i, y)-th element of M r - Then w |y J 
is equal to the number of r-step connections from P 2 to Pj. 


EXAMPLE 3 Using Theorem 10.6.1 


Figure 10.6.9 is the route map of a small airline that services the four cities P\,P 3 > P 3 , Pa- As 
a directed graph, its vertex matrix is 


M = 


We have that 


M 2 = 


‘2 

0 

1 

f 



'l 

3 

3 

l" 

1 

1 

1 

1 

and 

m 3 = 

2 

2 

3 

1 

0 

2 

2 

0 

4 

0 

2 

2 

2 

0 

1 

1 



1 

3 

3 

1 


If we are interested in connections from city P 4 to city P 3 , we may use Theorem 10.6.1 to find 

*43 


their number. Because m 43 = 1, there is one 1-step connection; because = there is one 


2-step connection; and because = 3 , there are three 3-step connections. To verify this, 
from Figure 10.6.9 we find 


1 -step connections from P 4 to P3: 

Pa 

->P 3 



2 -step connections from P4 to P3: 

Pa 

->P 2 

-»P 3 


3 -step connections from P4 to P3: 

Pa 

->P 3 

-»P 4 

-P 3 


Pa 



-P 3 


Pa 

->P 3 

-Pi 

-P 3 


Pa 

Figure 10.6.9 


Cliques 

In everyday language a “clique” is a closely knit group of people (usually three or more) that tends to 
communicate within itself and has no place for outsiders. In graph theory this concept is given a more precise 
meaning. 








DEFINITION 1 


A subset of a directed graph is called a clique if it satisfies the following three conditions: 

(i) The subset contains at least three vertices. 

(ii) For each pair of vertices P 2 and Pj in the subset, both Pj —► Pj and Pj —► Pj are true. 

(iii) The subset is as large as possible; that is, it is not possible to add another vertex to the subset and 
still satisfy condition (ii). 


L J 

This definition suggests that cliques are maximal subsets that are in perfect “communication” with each other. 
For example, if the vertices represent cities, and Pj —► Pj means that there is a direct airline flight from city 
Pj to city Pj, then there is a direct flight between any two cities within a clique in either direction. 

EXAMPLE 4 A Directed Graph with Two Cliques 

The directed graph illustrated in Figure 10.6.10 (which might represent the route map of an 
airline) has two cliques: 

{■Pi, Pi> Pa) and {P3. PA’ P6) 

This example shows that a directed graph may contain several cliques and that a vertex may 
simultaneously belong to more than one clique. 

Pi 


Pi 


P ( 1 


P 2 


P 4 


A 


Pi 


Figure 10.6.10 


For simple directed graphs, cliques can be found by inspection. But for large directed graphs, it would be 
desirable to have a systematic procedure for detecting cliques. For this purpose, it will be helpful to define a 
matrix S = [sy ] related to a given directed graph as follows: 


Sy - 



if Pi*+Pj 
otherwise 


The matrix S determines a directed graph that is the same as the given directed graph, with the exception that 
the directed edges with only one arrow are deleted. For example, if the original directed graph is given by 
Figure 10.6.11a, the directed graph that has S as its vertex matrix is given in Figure 10.6.116. The matrix S 
may be obtained from the vertex matrix M of the original directed graph by setting Sjj = 1 if = 1 

and setting Sjj = 0 otherwise. 

P x Pi 


P 2 


Pa 


Pi 

{a) 

Pi P 5 


P 2 


Pa 


Pi 

(b) 

Figure 10.6.11 


The following theorem, which uses the matrix S, is helpful for identifying cliques. 


Identifying Cliques 

Let s' "’ 1 be the (i, j)-th element of Then a vertex Pj belongs to some clique if and only if ^ Q. 


If £ o, then there is at least one 3-step connection from P l to itself in the modified directed graph 
determined by S. Suppose it is Pj —* Pj — » Pfr —► Pj. In the modified directed graph, all directed relations are 
two-way, so we also have the connections Pj «-» Pj <-» Pfc <-=> Pj. But this means that {Pj, Pj, Pk } is either a 
clique or a subset of a clique. In either case, Pj must belong to some clique. The converse statement, “if P i 
belongs to a clique, then ^ q,” follows in a similar manner. 


EXAMPLES Using Theorem 10.6.2 


Suppose that a directed graph has as its vertex matrix 

0 1 1 


M = 


1 0 1 
0 1 0 
1 0 0 


Then 


'0 

1 

0 

f 



'0 

3 

0 

2' 

1 

0 

1 

0 

and 

S 3 = 

3 

0 

2 

0 

0 

1 

0 

0 


0 

2 

0 

1 

1 

0 

0 

0 



2 

0 

1 

0 


S= 


Because all diagonal entries of S'-' are zero, it follows from Theorem 10.6.2 that the directed 
graph has no cliques. 


EXAMPLE 6 Using Theorem 10.6.2 


Suppose that a directed graph has as its vertex matrix 

0 10 11 
10 0 10 
M= 11010 
110 0 0 
l 1 0 0 1 0 

Then 


S= 


The nonzero diagonal entries of S^ are s |y', and Consequently, in the given directed 

1 1 TT 

graph, Pj, P'j_, and P 4 belong to cliques. Because a clique must contain at least three vertices, 
the directed graph has only one clique, {Pj, P 2 , P 4 } . 


’0 

1 

0 

1 

f 



'2 

4 

0 

4 

3" 

1 

0 

0 

1 

0 



4 

2 

0 

3 

1 

0 

0 

0 

0 

0 

and 

S 3 = 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 



4 

3 

0 

2 

1 

1 

0 

0 

0 

0 



3 

1 

0 

1 

0 


Dominance-Directed Graphs 

In many groups of individuals or animals, there is a definite “pecking order” or dominance relation between 
any two members of the group. That is, given any two individuals A and B, either A dominates B or B 














dominates A , but not both. In terms of a directed graph in which P 2 - —► Pj means P 2 dominates P j, this means 
that for all distinct pairs, either P 2 — ► Pj or Pj — ► Pi , but not both. In general, we have the following 
definition. 

r n 


DEFINITION 2 

A dominance-directed graph is a directed graph such that for any distinct pair of vertices P 2 and Pj, 
either Pj —► Pj or Pj —► Pj, but not both. 

J 


An example of a directed graph satisfying this definition is a league of n sports teams that play each other 
exactly one time, as in one round of a round-robin tournament in which no ties are allowed. If Pj —► Pj means 
that team P 2 beat team Pj in their single match, it is easy to see that the definition of a dominance-directed 
group is satisfied. For this reason, dominance-directed graphs are sometimes called tournaments. 

Figure 10.6.12 illustrates some dominance-directed graphs with three, four, and five vertices, respectively. In 
these three graphs, the circled vertices have the following interesting property: from each one there is either a 
1-step or a 2-step connection to any other vertex in its graph. In a sports tournament, these vertices would 
correspond to the most “powerful” teams in the sense that these teams either beat any given team or beat 
some other team that beat the given team. We can now state and prove a theorem that guarantees that any 
dominance-directed graph has at least one vertex with this property. 


Connections in Dominance-Directed Graphs 

In any dominance-directed graph, there is at least one vertex from which there is a 1-step or 2-step 
connection to any other vertex. 


Consider a vertex (there may be several) with the largest total number of 1-step and 2-step 
connections to other vertices in the graph. By renumbering the vertices, we may assume that Pi is such a 
vertex. Suppose there is some vertex P 2 such that there is no 1-step or 2-step connection from P\ to P 2 . Then, 
in particular, Pj —► P 2 is not true, so that by definition of a dominance-directed graph, it must be that 
Pi^p i . Next, let Pfc be any vertex such that Pi —♦Pft is true. Then we cannot have P^ —► P 2 , as then 
Pi “* Pfc “♦ Pj would be a 2-step connection from Pi to P 2 . Thus, it must be that P 2 —► P^. That is, P 2 has 

1- step connections to all the vertices to which Pj has 1-step connections. The vertex P 2 must then also have 

2- step connections to all the vertices to which Pi has 2-step connections. But because, in addition, we have 
that P 2 —¥ P i, this means that P 2 has more 1-step and 2-step connections to other vertices than does Pi. 
However, this contradicts the way in which Pi was chosen. Hence, there can be no vertex P 2 to which Pi has 
no 1-step or 2-step connection. 



Pi 


(«) 



Pa 
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P 2 



Pa 


(r) 

Figure 10.6.12 

This proof shows that a vertex with the largest total number of 1 -step and 2-step connections to other vertices 
has the property stated in the theorem. There is a simple way of finding such vertices using the vertex matrix 
M and its square M 1 . The sum of the entries in the ith row of M is the total number of 1 -step connections 
from Pj to other vertices, and the sum of the entries of the ith row of M 2 is the total number of 2-step 
connections from P 2 to other vertices. Consequently, the sum of the entries of the ith row of the matrix 
A=M+M 2 is the total number of 1-step and 2-step connections from P 2 to other vertices. In other words, 
a row of A = M I M 2 with the largest row sum identifies a vertex having the property stated in Theorem 
10.6.3. 

EXAMPLE 7 Using Theorem 10.6.3 

Suppose that five baseball teams play each other exactly once, and the results are as indicated in 
the dominance-directed graph of Figure 10.6.13. The vertex matrix of the graph is 


0 

0 

1 

1 

0 

1 

0 

1 

0 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

1 

1 

0 


so 


A=M + M 2 


The row sums of A are 


'o 

0 

1 

1 

0" 


'o 

1 

0 

1 

0" 


'o 

1 

1 

2 

0" 

1 

0 

1 

0 

1 


1 

0 

2 

3 

0 


2 

0 

3 

3 

1 

0 

0 

0 

1 

0 

+ 

0 

1 

0 

0 

0 

= 

0 

1 

0 

1 

0 

0 

1 

0 

0 

0 


1 

0 

1 

0 

1 


1 

1 

1 

0 

1 

1 

0 

1 

1 

0 


0 

1 

1 

2 

0 


1 

1 

2 

3 

0 


1 st row sum = 4 

2 nd row sum = 9 

3 rd row sum = 2 

4 th row sum = 4 

5 th row sum = 7 

Because the second row has the largest row sum, the vertex P 2 must have a 1 -step or 2-step 
connection to any other vertex. This is easily verified from Figure 10.6.13. 


/>, 


P7 




Pi 


Pa 


Figure 10.6.13 


We have informally suggested that a vertex with the largest number of 1-step and 2-step connections to other 
vertices is a “powerful” vertex. We can formalize this concept with the following definition. 

r n 


DEFINITION 3 


The power of a vertex of a dominance-directed graph is the total number of 1-step and 2-step 
connections from it to other vertices. Alternatively, the power of a vertex P l is the sum of the entries 
of the /th row of the matrix A = M | M 2 , where M is the vertex matrix of the directed graph. 


J 










EXAMPLE 8 Example 7 Revisited 


Let us rank the five baseball teams in Example 7 according to their powers. From the 
calculations for the row sums in that example, we have 

Power of team P\ = 4 
Power of team P 2 = 9 
Power of team P 3 = 2 
Power of team P 4 = 4 
Power of team P$ = 7 

Hence, the ranking of the teams according to their powers would be 

P 2 (first), P 5 (second), P\ and P 4 (tied for third), P 3 (last) 


Exercise Set 10.6 

1. Construct the vertex matrix for each of the directed graphs illustrated in Figure Ex-1. 


Py 

P 2 

(«) 


P t 


Py 


Py 


Pi 


(6) 


Pa 


P2 


Pi 


Pa 


Py 


Py 

(<•) 

Figure Ex-1 


^6 


Answer: 


(a) 


(b) 


(c) 


0 

0 

0 

1 



1 

0 

1 

1 



1 

1 

0 

1 



0 

0 

0 

0 



0 

1 

1 

0 

o' 


0 

0 

0 

0 

1 


1 

0 

0 

1 

0 


0 

0 

1 

0 

0 


0 

0 

1 

0 

0 


0 

1 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

1 

0 


2. Draw a diagram of the directed graph corresponding to each of the following vertex matrices. 


(a) 

0 

1 

1 

0 




1 

0 

0 

0 




0 

0 

0 

1 




1 

0 

1 

0 



(b) 

"o 

0 

1 

0 

o' 



1 

0 

0 

0 

1 



0 

1 

0 

1 

1 



0 

0 

0 

0 

0 



1 

1 

1 

0 

0 


(c) 

"o 

1 

0 

1 

0 

1 


1 

0 

0 

0 

1 

0 


0 

0 

0 

0 

0 

0 


1 

1 

0 

0 

1 

0 


0 

0 

0 

1 

0 

1 


0 

1 

0 

0 

1 

0 

Answer: 





(a) 1 

p . 




p 2 



P* Pi 














(b) ^ p ' 

Ps 



P 6 Ps Pa 

3. Let M be the following vertex matrix of a directed graph: 

0 1 1 f 
10 0 0 
0 10 1 
0 110 

(a) Draw a diagram of the directed graph. 

(b) Use Theorem 10.6.1 to find the number of 1-, 2-,and 3-step connections from the vertex P\ to the 
vertex Pj. Verify your answer by listing the various connections as in Example 3. 

(c) Repeat part (b) for the 1-, 2-, and 3-step connections from P\ to P 4 . 

Answer: 


Pa 


P2 P 3 


(b) 

1- 

- step: 

^1 

-Pi 




2- 

- step: 

Pi 

-P4 

- P2 





Pi 

-P3 

-P2 



3- 

- step: 

Pi 

P2 

-Pi 

-P2 




Pi 

P3 

-Pa 

-P2 




Pi 

P 4 

-P3 

-P2 

(c) 

1 - 

- step: 

Pi 

—*P 4 




2- 

- step: 

Pi 

-*P 3 

-Pa 



3- 

- step: 

Pi 

P2 

— Pi 

-Pa 




Pi 

- Pa 

-P3 

* P4 


(a) Compute the matrix product M ^M f° r the vertex matrix M in Example 1. 




(b) Verify that the Mi diagonal entry of M ^M is the number of family members who influence the Mi 
family member. Why is this true? 

(c) Find a similar interpretation for the values of the nondiagonal entries of M ^M- 


Answer: 


(a) 


1 

0 

0 

0 

0 


0 0 0 0 
10 0 0 
0 110 
0 12 1 
0 0 12 


(c) The i jth entry is the number of family members who influence both the ith and /th family members. 


5. By inspection, locate all cliques in each of the directed graphs illustrated in Figure Ex-5. 


Pi 


Pi 


P-, 


Ps 


Pi 


P 2 


Pi 

(a) 


Pi 


Pi 

(b) 


P\ P 2 P, 


Pa 


Pa 


Pi P* Ps 

(c) 

Figure Ex-5 


Answer: 

(a) {-Pi, Pi, P 3 } 

(b) {P 3 , P 4 , P 5 } 

(c) (P 2 , P 4 , P(„ Pz) and {P 4 , P 5 . ?6) 

6. For each of the following vertex matrices, use Theorem 10.6.2 to find all cliques in the corresponding 
directed graphs. 




(a) 


(b) 


0 

1 

0 

1 

1 

0 

1 

0 

1 

0 

0 


Answer: 


1 

0 

1 

0 

0 

1 

0 

1 

0 

1 

0 


0 

1 

0 

0 

1 

0 

1 

0 

1 

0 

1 


1 

0 

1 

0 

1 

1 

0 

1 

0 

1 

1 


0 

1 

1 

1 

0 

1 

1 

0 

1 

0 

1 


0 

1 

1 

1 

0 

0 


(a) None 

(b) {^3.^4 ’^6) 


7 . For the dominance-directed graph illustrated in Figure Ex-7 construct the vertex matrix and find the power 
of each vertex. 


/>, 


P, 


Figure Ex-7 


Answer: 



'o 

0 

1 

f 

Power of/>i = 5 

1 

0 

0 

0 

Power of P 2 = 3 

0 

1 

0 

1 

Power of P 3 = 4 

0 

1 

0 

0 

Power of P 4 = 2 


8 . Five baseball teams play each other one time with the following results: 

A beats B, C, D 
B beats C, E 
C beats D, E 

D beats B 
E beats A, D 

Rank the five baseball teams in accordance with the powers of the vertices they correspond to in the 
dominance-directed graph representing the outcomes of the games. 








Answer: 


First, A; second, B and E (tie); fourth, C; fifth, D 

Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 

Tl. A graph having n vertices such that every vertex is connected to every other vertex has a vertex matrix 
given by 



~0 

1 

1 

1 

1 

... 1 


1 

0 

1 

1 

1 

... 1 


1 

1 

0 

1 

1 

... 1 

M n = 

1 

1 

1 

0 

1 

... 1 


1 

1 

1 

1 

0 

... 1 


1 

1 

1 

1 

1 

... 0 


In this problem we develop a formula for whose (i, y)-th entry equals the number of /c-step connections 
from Pj to Pj. 

(a) Use a computer to compute the eight matrices for n = 2, 3 and for £ = 2, 3, 4, 5. 

(b) Use the results in part (a) and symmetry arguments to show that can be written as 



0 

1 1 

1 

1 ... 

1 



1 

0 1 

1 

1 ... 

1 



1 

1 0 

1 

1 ... 

1 


1 - 

1 

1 1 

0 

1 ... 

1 



1 

1 1 

1 

0 ... 

1 



1 

1 1 

1 

1 ... 

0 


Oft 

0 k 

0 k 

0 k 

0 k 

_ 

0 k 

0 k 

Oft 

0 k 

0 k 

0 k 

... 

0 k 

0 k 

Pk 

Oft 

0 k 

3 k 

... 

0 k 

,3 k 

0 k 

0 k 

Oft 

0 k 

... 

0 k 

0 k 

0 k 

0 k 

0 k 

Oft 

... 

0 k 

0 k 

0 k 

0 k 

0 k 

0 k 


Oft 












where U n is the ^ x n matrix all of whose entries are ones and l n is the n x n identity matrix. 

(f) Show that for n > 2, all vertices for these directed graphs belong to cliques. 

T2. Consider a round-robin tournament among n players (labeled ^1,^2? £ 3 , - - a n) where ct \ beats & 2, a 2 
beats £ 3 , a 3 beats C 14 ,&n-\ beats a n , and ct n beats a\. Compute the “power” of each player, showing that 
they all have the same power; then determine that common power. 

[Hint: Use a computer to study the cases n = 3, 4, 5, 6 ; then make a conjecture and prove your conjecture to 
be true.] 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.7 Games of Strategy 

In this section we discuss a general game in which two competing players choose separate strategies to reach opposing 
objectives. The optimal strategy of each player is found in certain cases with the use of matrix techniques. 


Prerequisites 

Matrix Multiplication 
Basic Probability Concepts 


Game Theory 


To introduce the basic concepts in the theory of games, we will consider the following carnival-type game that two 
people agree to play. We will call the participants in the game player R and player C. Each player has a stationary wheel 
with a movable pointer on it as in Figure 10.7.1. For reasons that will become clear, we will call player R's wheel the 
row-wheel and player C's wheel the column-wheel. The row-wheel is divided into three sectors numbered 1, 2, and 3, 
and the column-wheel is divided into four sectors numbered 1, 2, 3, and 4. The fractions of the area occupied by the 
various sectors are indicated in the figure. To play the game, each player spins the pointer of his or her wheel and lets it 
come to rest at random. The number of the sector in which each pointer comes to rest is called the move of that player. 
Thus, player R has three possible moves and player C has four possible moves. Depending on the move each player 
makes, player C then makes a payment of money to player R according to Table 1. 




of player C 
Figure 10.7.1 


Table 1 










Player Cs Move 

1 

2 

3 

4 

Player R's 

Move 

1 

$3 

$5 

-S2 

-SI 

2 

-$2 

S4 

-$3 

-$4 

3 

S6 

-S5 

$0 

$3 


For example, if the row-wheel pointer comes to rest in sector 1 (player R makes move 1), and the column-wheel pointer 
comes to rest in sector 2 (player C makes move 2), then player C must pay player R the sum of $5. Some of the entries 
in this table are negative, indicating that player C makes a negative payment to player R. By this we mean that player R 
makes a positive payment to player C. For example, if the row-wheel shows 2 and the column-wheel shows 4, then 
player R pays player C the sum of $4, because the corresponding entry in the table is -$4. In this way the positive entries 
of the table are the gains of player R and the losses of player C, and the negative entries are the gains of player C and the 
losses of player iL 

In this game the players have no control over their moves; each move is determined by chance. However, if each player 
can decide whether he or she wants to play, then each would want to know how much he or she can expect to win or lose 
over the long term if he or she chooses to play. (Later in the section we will discuss this question and also consider a 
more complicated situation in which the players can exercise some control over their moves by varying the sectors of 
their wheels.) 


Two-Person Zero-Sum Matrix Games 

The game described above is an example of a two-person zero-sum matrix game. The term zero-sum means that in each 
play of the game, the positive gain of one player is equal to the negative gain (loss) of the other player. That is, the sum 
of the two gains is zero. The term matrix game is used to describe a two-person game in which each player has only a 
finite number of moves, so that all possible outcomes of each play, and the corresponding gains of the players, can be 
displayed in tabular or matrix form, as in Table 1. 

In a general game of this type, let player R have m possible moves and let player C have n possible moves. In a play of 
the game, each player makes one of his or her possible moves, and then a payoff is made from player C to player R , 
depending on the moves. For i = 1, 2, m, and j = 1, 2,let us set 

ajj = payoff that player C makes to player R if player R 

makes move i and player C makes move j 

This payoff need not be money; it can be any type of commodity to which we can attach a numerical value. As before, if 
an entry a ij is negative, we mean that player C receives a payoff of \^ij | from player R. We arrange these mn possible 
payoffs in the form of an ^ x n matrix 


an 

<*12 - 

■- a \n 

<*21 

<*22 - 

- <*2n 

<*ml 

<*m2 - 

a mn 


which we will call the payoff matrix of the game. 

Each player is to make his or her moves on a probabilistic basis. For example, for the game discussed in the 

















introduction, the ratio of the area of a sector to the area of the wheel would be the probability that the player makes the 
move corresponding to that sector. Thus, from Figure 10.7.1, we see that player R would make move 2 with probability 
y, and player C would make move 2 with probability In the general case we make the following definitions: 


Pi = probability that player R makes move i 
qj = probability that player C makes move j 

It follows from these definitions that 

?1 +?2 + * • • *Pm = 1 

and 

q\ + <72 + ’ * * + <7w = 1 
With the probabilities Pi and Qj we form two vectors: 


P = [Pi P2 


Pm) 


and 


(i = 1, 2, m) 
0 = 1, 2 ,..., n) 


q = 


q\ 

<n 

<tn 


We call the row vector p the strategy of player R and the column vector q the strategy of player C. For example, from 
Figure 10.7.1 we have 


I 

4 

1 



3 

1 

6 


for the carnival game described earlier. 


From the theory of probability, if the probability that player R makes move i is Pi, and independently the probability that 
player C makes move j is 7;, then PiQj is the probability that for any one play of the game, player R makes move i and 
player C makes move j. The payoff to player R for such a pair of moves is a ij. If we multiply each possible payoff by its 
corresponding probability and sum over all possible payoffs, we obtain the expression 


<*\\P\q\ + ai2/?l<72 + + tf21/?2<7l +... + 


( 1 ) 


Equation 1 is a weighted average of the payoffs to player R ; each payoff is weighted according to the probability of its 
occurrence. In the theory of probability, this weighted average is called the expected payoff to player R. It can be shown 
that if the game is played many times, the long-term average payoff per play to player R is given by this expression. We 
denote this expected payoff by fi^p, q) to emphasize the fact that it depends on the strategies of the two players. From 
the definition of the payoff matrix A and the strategies p and q, it can be verified that we may express the expected 
payoff in matrix notation as 


£(p> q) = [/>i pi • 

- Pm) 

"an 

a 2\ 

«12 - 

a 22 • 

a\ n 

•* a 2n 

~q\ 

<12 




<*m2 - 

- a mn 



= p^q 


( 2 ) 


Because fi^p, q) is the expected payoff to player R , it follows that —E{ p, q) is the expected payoff to player C. 



EXAMPLE 1 Expected Payoff to Player R 


For the carnival game described earlier, we have 



If = 1805... 


Thus, in the long run, player R can expect to receive an average of about 18 cents from player C in each 
play of the game. 


So far we have been discussing the situation in which each player has a predetermined strategy. We will now consider 
the more difficult situation in which both players can change their strategies independently. For example, in the game 
described in the introduction, we would allow both players to alter the areas of the sectors of their wheels and thereby 
control the probabilities of their respective moves. This qualitatively changes the nature of the problem and puts us 
firmly in the field of true game theory. It is understood that neither player knows what strategy the other will choose. It 
is also assumed that each player will make the best possible choice of strategy and that the other player knows this. 

Thus, player R attempts to choose a strategy p such that £(p, q) is as large as possible for the best strategy q that player 
C can choose; and similarly, player C attempts to choose a strategy q such that £(p, q) is as small as possible for the 
best strategy p that player R can choose. To see that such choices are actually possible, we will need the following 
theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The general proof, which involves ideas 
from the theory of linear programming, will be omitted. However, below we will prove this theorem for what are called 
strictly determined games and 2x2 matrix games.) 


Fundamental Theorem of Zero-Sum Games 

5f: 

There exist strategies p and q such that 


£(p*> q) >£(p*, q*) >£(p, q*) 


(3) 


for all strategies p and q. 


Jfc SfC 

The strategies p and q in this theorem are the best possible strategies for players R and C, respectively. To see why 

this is so, let v = E( p , q )• The left-hand inequality of Equation 3 then reads 

* 

£(p ,q)>v for all strategies q 
♦ 

This means that if player R chooses the strategy p , then no matter what strategy q player C chooses, the expected 

payoff to player R will never be below v. Moreover, it is not possible for player R to achieve an expected payoff greater 
than v. To see why, suppose there is some strategy p that player R can choose such that 

** 

£(p .q)>v 


for all strategies q 






Then, in particular, 


** * 

£(p . q ) > v 

♦ ♦ * 

But this contradicts the right-hand inequality of Equation 3, which requires that v > £(p , q ). Consequently, the best 

player R can do is prevent his or her expected payoff from falling below the value v. Similarly, the best player C can do 

♦ 

is ensure that player R 's expected payoff does not exceed v, and this can be achieved by using strategy q . 

On the basis of this discussion, we arrive at the following definitions. 


DEFINITION 1 

Ifp and q are strategies such that 


£(p*> q) >£(p*, q*) >£(p, q*) 


(4) 


for all strategies p and q, then 

(i) p + is called an optimal strategy for player R. 

(ii) q is called an optimal strategy for player C. 
(ill) y = E {p ,q ) is called the value of the game, 


L J 

The wording in this definition suggests that optimal strategies are not necessarily unique. This is indeed the case, and in 
Exercise 2 we ask you to show this. However, it can be proved that any two sets of optimal strategies always result in 
the same value v of the game. That is, if p , q and p , q are optimal strategies, then 


3|C 3fC 9|C )|C 

£(p , q ) = £(p , 


q ) 


(5) 


The value of a game is thus the expected payoff to player R when both players choose any possible optimal strategies. 

* * 

To find optimal strategies, we must find vectors p and q that satisfy Equation 4. This is generally done by using linear 

programming techniques. Next, we discuss special cases for which optimal strategies may be found by more elementary 
techniques. 


We now introduce the following definition. 


n 


DEFINITION 2 

An entry ct r5 in a payoff matrix A is called a saddle point if 

(i) ct rs is the smallest entry in its row, and 

(ii) is the largest entry in its column. 

A game whose payoff matrix has a saddle point is called strictly determined. 

L J 

For example, the shaded element in each of the following payoff matrices is a saddle point: 


r: it 

30 

-50 

-5' 

60 

90 

75 

L-4 oJ 1 

-10 

60 

-30 


0 -3 5 -9 

15 -8 -2 10 

7 10 6 9 

6 11-3 2 


If a matrix has a saddle point «rs, it turns out that the following strategies are optimal strategies for the two players: 

o" 


p = [0 0 ... 1 ... 0], 

/ 

rth entiy 


q = 


sth entry 


That is, an optimal strategy for player R is to always make the rth move, and an optimal strategy for player C is to 
always make the sth move. Such strategies for which only one move is possible are called pure strategies. Strategies for 
which more than one move is possible are called mixed strategies. To show that the above pure strategies are optimal, 
you can verify the following three equations (see Exercise 6): 


* . * 

£(p > q ) =p =«« 


( 6 ) 


£(p* . q) = p *Aq > a r5 for any strategy q 


(7) 


£(p q*) =vM*<<*rs for any strategy P (8) 

Together, these three equations imply that 

£(p*. q) >£(p*, q*) >£(p. q*) 

for all strategies p and q. Because this is exactly Equation 4, it follows that p and q are optimal strategies. 

From Equation 6 the value of a strictly determined game is simply the numerical value of a saddle point a r s- It is 
possible for a payoff matrix to have several saddle points, but then the uniqueness of the value of a game guarantees that 
the numerical values of all saddle points are the same. 

EXAMPLE 2 Optimal Strategies to Maximize a Viewing Audience 

Two competing television networks, R and C, are scheduling one-hour programs in the same time period. 
Network R can schedule one of three possible programs, and network C can schedule one of four possible 
programs. Neither network knows which program the other will schedule. Both networks ask the same 
outside polling agency to give them an estimate of how all possible pairings of the programs will divide the 
viewing audience. The agency gives them each Table 2, whose (z, j )-th entry is the percentage of the 
viewing audience that will watch network R if network R 's program i is paired against network C s program 
j. What program should each network schedule in order to maximize its viewing audience? 


Table 2 









Network C* s 
Program 

1 

2 

3 

4 

Network R's 

Program 

1 

60 

20 

30 

55 

2 

50 

75 

45 

60 

3 

70 

45 

35 

30 


Subtract 50 from each entry in Table 2 to construct the following matrix: 


10 

-30 

-20 

5 

0 

25 

-5 

10 

20 

-5 

-15 

-20 


This is the payoff matrix of the two-person zero-sum game in which each network is considered to start 
with 50% of the audience, and the (i, j)- th entry of the matrix is the percentage of the viewing audience 
that network C loses to network R if programs i and j are paired against each other. It is easy to see that the 
entry 

<* 23 = “ 5 


is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to schedule program 2, 
and the optimal strategy of network C is to schedule program 3. This will result in network R' s receiving 
45% of the audience and network C s receiving 55% of the audience. 


2x2 Matrix Games 


Another case in which the optimal strategies can be found by elementary means occurs when each player has only two 
possible moves. In this case, the payoff matrix is a 2 x 2 matrix 


r«n *i2 

[*21 *22 


If the game is strictly determined, at least one of the four entries of A is a saddle point, and the techniques discussed 
above can then be applied to determine optimal strategies for the two players. If the game is not strictly determined, we 
first compute the expected payoff for arbitrary strategies p and q: 


£(P. q) =?Aq= [PI P2] 

= a \\P\Q\ + <*127>1‘72+<*217>2<71 + <*227>2<?2 


'<*11 <*12~ 

'<7l' 

<*21 <*22 _ 

<72 


(9) 


Because 


P\+P2=^ and <71+<72 = 1 (10) 

we may substitute p2 = 1 — pi and = 1 - <71 into 9 to obtain 


£(p, q) = a\\p\q\ + <*12/>lO -<7l) + <»2lO ~P\)<1\ +<*220 -<?l) 


( 11 ) 






















If we rearrange the terms in Equation 11, we can write 


£(p, q) = [(<311 + 322 “312 - («22-«2l)]?l + ("12 “322)^1 + «22 (12) 


By examining the coefficient of the <J\ term in 12, we see that if we set 


* 

pi=pi 


_ aj2 ~ 321 _ 

«11 +« 22 - ai2-«21 


(13) 


then that coefficient is zero, and 12 reduces to 


£(p*. q) = 


_ a 11^22-312^21 _ 

311 + 322 - 312-321 


(14) 


Equation 14 is independent of q; that is, if player R chooses the strategy determined by 13, player C cannot change the 
expected payoff by varying his or her strategy. 


In a similar manner, it can be verified that if player C chooses the strategy determined by 


* 

?i =?i 


_ aj2 — 3i 2 _ 

311 + 322 — 312 — 321 


then substituting in 12 gives 


£(p.q*) = 


_ 311322-312321 _ 

311 +322-312-321 


Equations 14 and 16 show that 


£(p . q) =£(p . q ) =£(p, q ) 


(15) 


(16) 


(17) 


for all strategies p and q. Thus, the strategies determined by 13, 15, and 10 are optimal strategies for players R and C, 
respectively, and so we have the following result. 


Optimal Strategies for a 2 x 2 Matrix Game 


For a 2 x 2 game that is not strictly determined, optimal strategies for players R and C are 


♦ 

p 


322-321 


311 +322 — 312 — 321 311 + 


311-312 _] 

322-312-321 J 


and 


* 

q 


_ 322 — 3n _ 

311 +322-312-321 
_ 311 -321 _ 

311 +322-312-321 


The value of the game is 


v = _ 311^22 - 312321 _ 

311 +322 —312 —321 


♦ ♦ 

In order to be complete, we must show that the entries in the vectors p and q are numbers strictly between 0 and 1. In 
Exercise 8 we ask you to show that this is the case as long as the game is not strictly determined. 













Equation 17 is interesting in that it implies that either player can force the expected payoff to be the value of the game 
by choosing his or her optimal strategy, regardless of which strategy the other player chooses. This is not true, in 
general, for games in which either player has more than two moves. 

EXAMPLE 3 Using Theorem 10.7.2 


The federal government desires to inoculate its citizens against a certain flu virus. The virus has two 
strains, and the proportions in which the two strains occur in the virus population is not known. Two 
vaccines have been developed and each citizen is given only one of them. Vaccine 1 is 85% effective 
against strain 1 and 70% effective against strain 2. Vaccine 2 is 60% effective against strain 1 and 90% 
effective against strain 2. What inoculation policy should the government adopt? 


We can consider this a two-person game in which player R (the government) desires to make 
the payoff (the fraction of citizens resistant to the virus) as large as possible, and player C (the virus) 
desires to make the payoff as small as possible. The payoff matrix is 

Strain 

1 2 

.85 . 70 " 

.60 .90 


Vaccine 


This matrix has no saddle points, so Theorem 10.7.2 is applicable. Consequently, 


P 1 

* 

P2 

* 

*1 

* 

*2 

v 


a 22 ~ a 21 

. 90-60 


.30 2 

a \ i + a 2 2 - an - a 2 ]. 

. 85 + 90 - 70 - 

.60 

.45 3 





a 22 a \2 

. 90-70 


.20 4 

<*11 +<*2212 “<*21 

. 85 + 90 - 70 - 

.60 ' 

.45 9 

i *i 4 _ 5 

1 1 9 9 




fl ll fl 22 -<* 12*21 

(- 85 ) (. 90 ) — (. 70 ) (. 60 ) 

.345 


a \\ +<322 — a \2 — 021 .85 + .90 — .70 - .60 .45 



citizens with vaccine 2. This will guarantee that about 76.7% of the citizens will be resistant to a virus 
attack regardless of the distribution of the two strains. 


In contrast, a virus distribution of 4 of strain 1 and 4- of strain 2 will result in the same 76.7% of resistant 

9 9 

citizens, regardless of the inoculation strategy adopted by the government (see Exercise 7). 


Exercise Set 10.7 



6 

-7 

0 


-4 1 

3 8 

6 -2 


1. Suppose that a game has a payoff matrix 












(a) If players R and C use strategies 



respectively, what is the expected payoff of the game? 

(b) If player C keeps his strategy fixed as in part (a), what strategy should player R choose to maximize his expected 
payoff? 

(c) If player R keeps her strategy fixed as in part (a), what strategy should player C choose to minimize the expected 
payoff to player R7 

Answer: 


(a) “5/8 

(b) [0 1 0] 

(c) [1 0 0 0] r 

2. Construct a simple example to show that optimal strategies are not necessarily unique. For example, find a payoff 
matrix with several equal saddle points. 

Answer: 

Let A = | | , for example. 

3. For the strictly determined games with the following payoff matrices, find optimal strategies for the two players, and 
find the values of the games. 

(a) [5 2 

1 3_ 

(b) f—3 -2" 

2 4 

-4 1 

(c) 2 —2 0 

-6 0 -5 

5 2 3 

(d) [-3 2 -f 

-2 -1 5 

-4 1 0 

-3 4 6 


Answer: 


(a) P* = [0 1], q*= J , v = 3 



(b) P*= [0 1 0], q* = * , v = 2 

(c) [O' 

p*= [0 0 1], q*= 1 , v = 2 
0 

w * * r r 

p = [0100], q = 0, v = — 2 

0 


4. For the 2 x 2 games with the following payoff matrices, find optimal strategies for the two players, and find the 
values of the games. 

(a) [ 6 3 

-1 4 . 

(b) f 40 20' 

-10 30_ 



(d) [3 5' 

_5 2_ 



Answer: 



5. Player R has two playing cards: a black ace and a red four. Player C also has two cards: a black two and a red three. 
Each player secretly selects one of his or her cards. If both selected cards are the same color, player C pays player R 
the sum of the face values in dollars. If the cards are different colors, player R pays player C the sum of the face 
values. What are optimal strategies for both players, and what is the value of the game? 


Answer: 




11 

20 


3 _ 

20 


20 


6 . Verify Equations 6, 7, and 8. 

7. Verify the statement in the last paragraph of Example 3. 

* * 

8. Show that the entries of the optimal strategies p and q given in Theorem 10.7.2 are numbers strictly between zero 


and one. 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica , Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific 
calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for 
the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your 
technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology 
utility to solve many of the problems in the regular exercise sets. 

Tl. Consider a game between two players where each player can make up to n different moves (« > 1) . If the ith move 
of player R and the j th move of player C are such that i + j is even, then C pays R $ 1. If i + j is odd, then R pays C$1. 
Assume that both players have the same strategy—that is, p„ = [p 2 ] and q„ = [pi] nx \, where 
p\ + P2 + P2 + - - - + pyi = 1 • Use a computer to show that 


£(P2,q2) =(.P\~P2) 2 

£(P3. Q3) =(P1-P2+P3) 2 

£(P4, Q4) = (PI -P2+P2-P4) 2 

£(P 5.95) = Ol -P2+P3-P4 + P5) 2 


Using these results as a guide, prove in general that the expected payoff to player R is 



which shows that in the long run, player R will not lose in this game. 

T2. Consider a game between two players where each player can make up to n different moves {n> 1) . If both players 


make the same move, then player C pays player R $ (n — 1). However, if both players make different moves, then 
player R pays player C$1. Assume that both players have the same strategy—that is, p„ = [pj] i XM and q„ = [pj] n> \, 
where pi + p2 + pi +... + p n = 1. Use a computer to show that 




£(P2. <12) = ^(pl ~P\) 2 + j(Pl -P2 ) 2 + ^(P2~P\) 2 
+^(P2~P2) 2 

E(p3, q3) = pi -pi) 2 + ^(pi -pi) 2 + j(pi - pi) 2 
+^(P2 ~P\) 2 + ^(P2 -P2) 2 + ^(P2 -Pi) 2 
+^(P3 -Pi) 2 + \{p3-pi) 2 + ^(P3 -pi) 2 
£(P4. Q4) = ^(pl “Pl) 2 + ^(pi -P2) 2 + ^(P1 -P3) 2 
+^(Pl -P4) 2 + ^(P2-Pl) 2 + ^-(P2~P2) 2 
+^ (P2 ~ P3) 2 + ^ (P2 - P4) 2 + J (P3 ~ PI) 2 
(P3 “ P2) 2 + J (P3 - Pi) 2 + ^(P3 “ P4) 2 
+J (P4 “ PI) 2 + ^ (P4 “ P2) 2 + ^ (P4 - P3) 2 
+-^(P4-P4) 2 

Using these results as a guide, prove in general that the expected payoff to player R is 

E(Vn. q«)4EE(A-p/) 2 >0 

2 i=y=i ' 

which shows that in the long run, player R will not lose in this game. 
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10.8 Leontief Economic Models 

In this section we discuss two linear models for economic systems. Some results about nonnegative matrices are applied to determine 
equilibrium price structures and outputs necessary to satisfy demand. 


Prerequisites 

Linear Systems 
Matrices 


Economic Systems 

Matrix theory has been very successful in describing the interrelations among prices, outputs, and demands in economic systems. In 
this section we discuss some simple models based on the ideas of Nobel laureate Wassily Leontief. We examine two different but 
related models: the closed or input-output model, and the open or production model. In each, we are given certain economic 
parameters that describe the interrelations between the “industries” in the economy under consideration. Using matrix theory, we then 
evaluate certain other parameters, such as prices or output levels, in order to satisfy a desired economic objective. We begin with the 
closed model. 


Leontief Closed (Input-Output) Model 

First we present a simple example; then we proceed to the general theory of the model. 

EXAMPLE 1 An Input-Output Model 

Three homeowners—a carpenter, an electrician, and a plumber—agree to make repairs in their three homes. They agree 
to work a total of 10 days each according to the following schedule: 



Work Performed by 


Carpenter 

Electrician 

Plumber 

Days of Work in Home of Carpenter 

o 

1 

6 

Days of Work in Home of Electrician 

4 

5 

1 

Days of Work in Home of Plumber 

4 

4 

3 


For tax purposes, they must report and pay each other a reasonable daily wage, even for the work each does on his or her 
own home. Their normal daily wages are about $100, but they agree to adjust their respective daily wages so that each 
homeowner will come out even—that is, so that the total amount paid out by each is the same as the total amount each 
receives. We can set 

p\ = daily wage of carpenter 
P2 = daily wage of electrician 
P2 = daily wage of plumber 

To satisfy the “equilibrium” condition that each homeowner comes out even, we require that 

total expenditures = total income 

for each of the homeowners for the 10-day period. For example, the carpenter pays a total of 2p\ 4- P2 + &P3 for the 














repairs in his own home and receives a total income of KDpi for the repairs that he performs on all three homes. 

Equating these two expressions then gives the first of the following three equations: 

2pi + P2 + 6P3 = lO/n 

Ap\ + 5p2 + P3 = 10/>2 

Ap\ + Ap2 + 3p3 = 10^3 

The remaining two equations are the equilibrium equations for the electrician and the plumber. Dividing these equations 
by 10 and rewriting them in matrix form yields 

".2 .1 .6lpll [ P\ 

A .5 .1 P2 = P2 (1) 

.4 .4 .3 P3 P3 

Equation 1 can be rewritten as a homogeneous system by subtracting the left side from the right side to obtain 

.8 -.1 -,6ir Pll |"0" 

-.4 .5 -.1 P2 = 0 

—.4 —.4 .7 J |_^3 J |_0 

The solution of this homogeneous system is found to be (verify) 

~P ll [31" 

P2 = s 32 
_P3\ |_36_ 

where s is an arbitrary constant. This constant is a scale factor, which the homeowners may choose for their 
convenience. For example, they can set s = 3 so that the corresponding daily wages—$93, $96, and $108—are about 
$ 100 . 


This example illustrates the salient features of the Leontief input-output model of a closed economy. In the basic Equation 1, each 
column sum of the coefficient matrix is 1, corresponding to the fact that each of the homeowners' “output” of labor is completely 
distributed among these same homeowners in the proportions given by the entries in the column. Our problem is to determine suitable 
“prices” for these outputs so as to put the system in equilibrium—that is, so that each homeowner's total expenditures equal his or her 
total income. 

In the general model we have an economic system consisting of a finite number of “industries,” which we number as industries 
1, 2, k. Over some fixed period of time, each industry produces an “output” of some good or service that is completely utilized in a 
predetermined manner by the k industries. An important problem is to find suitable “prices” to be charged for these k outputs so that 
for each industry, total expenditures equal total income. Such a price structure represents an equilibrium position for the economy. 

For the fixed time period in question, let us set 

Pi = price charged by the zth industiy for its total output 

= fraction of the total output of the yth industry purchased by the zth industry 

for j, j = 1, 2,..., k. By definition, we have 

(i) Pi>®, 

(ii) €ij > 0, i, j = 1, 2. k 

(in) + + — + «*/= 1* j= 2,...,k 

With these quantities, we form the price vector 

>1 

P2 

P = 

Pk 


and the exchange matrix or input-output matrix 




011 

012 - 

■■ «1 k 

021 

022 - 

02 k 

0fcl 

0*2 ■ 

-• 0/cfc 


Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1. 

As in the example, in order that the expenditures of each industry be equal to its income, the following matrix equation must be 
satisfied [see 1]: 


£p = p ( 2 ) 

or 

(/-S)p = 0 (3) 

Equation 3 is a homogeneous linear system for the price vector p. It will have a nontrivial solution if and only if the determinant of its 
coefficient matrix / _ £ is zero. In Exercise 7 we ask you to show that this is the case for any exchange matrix E. Thus, 3 always has 
nontrivial solutions for the price vector p. 

Actually, for our economic model to make sense, we need more than just the fact that 3 has nontrivial solutions for p. We also need the 
prices Pi of the k outputs to be nonnegative numbers. We express this condition as p > 0. (In general, if A is any vector or matrix, the 
notation A > 0 means that every entry of A is nonnegative, and the notation A > 0 means that every entry of A is positive. Similarly, 

A > B means A — B > 0, and A > B means A — B > 0-) To show that 3 has a nontrivial solution for which p > 0 is a bit more difficult 
than showing merely that some nontrivial solution exists. But it is true, and we state this fact without proof in the following theorem. 


THEOREM 10.8.1 

If E is an exchange matrix, then £p = p always has a nontrivial solution p whose entries are nonnegative. 


Let us consider a few simple examples of this theorem. 

EXAMPLE 2 Using Theorem 10.8.1 


Let 


Then (/ — E) p = 0 is 


which has the general solution 



0 

1 



P 



where 5 is an arbitrary constant. We then have nontrivial solutions p > 0 for any $ > Q. 


EXAMPLE 3 Using Theorem 10.8.1 














Then {1 — E) p = 0 has the general solution 


where s and t are independent arbitrary constants. Nontrivial solutions p > 0 then result from any s > 0 and t > 0, not 
both zero. 


Example 2 indicates that in some situations one of the prices must be zero in order to satisfy the equilibrium condition. Example 3 
indicates that there may be several linearly independent price structures available. Neither of these situations describes a truly 
interdependent economic structure. The following theorem gives sufficient conditions for both cases to be excluded. 


THEOREM 10.8.2 

Let E be an exchange matrix such that for some positive integer m all the entries of E m are positive. Then there is exactly one 
linearly independent solution of (/ — 5)p = 0, and it may be chosen so that all its entries are positive. 


We will not give a proof of this theorem. If you have read Section 10.5 on Markov chains, observe that this theorem is essentially the 
same as Theorem 10.5.4. What we are calling exchange matrices in this section were called stochastic or Markov matrices in Section 


10.5. 


EXAMPLE 4 Using Theorem 10.8.2 


The exchange matrix in Example 1 was 



.1 .6 
.5 .1 
.4 .3 


Because g > 0, the condition E m > 0 in Theorem 10.8.2 is satisfied for m = ]. Consequently, we are guaranteed that 
there is exactly one linearly independent solution of (/ — E)p = 0, and it can be chosen so that p > 0. In that example, 
we found that 


P = 


31 

32 
36 


is such a solution. 


Leontief Open (Production) Model 

In contrast with the closed model, in which the outputs of k industries are distributed only among themselves, the open model attempts 
to satisfy an outside demand for the outputs. Portions of these outputs can still be distributed among the industries themselves, to keep 
them operating, but there is to be some excess, some net production, with which to satisfy the outside demand. In the closed model the 
outputs of the industries are fixed, and our objective is to determine prices for these outputs so that the equilibrium condition, that 
expenditures equal incomes, is satisfied. In the open model it is the prices that are fixed, and our objective is to determine levels of the 
outputs of the industries needed to satisfy the outside demand. We will measure the levels of the outputs in terms of their economic 
values using the fixed prices. To be precise, over some fixed period of time, let 












Xj = monetary value of the total output of the zth industry 

d 2 = monetary value of the output of the zth industry needed to satisfy the outside demand 

Cjj = monetary value of the output of the ith industry needed by the yth industry to produce one unit of monetary value of its own output 


With these quantities, we define the production vector 


the demand vector 


and the consumption matrix 


By their nature, we have that 


d = 


*1 

*2 

x k 

di 

d 2 

dfc 


x> 0, 



"<Hi 

C 12 

••• ci fc 

c= 

c 2\ 

c 22 

••• C2k 



Ck.2 

■■■ c kk 

d> 0, 


and 


C> 0 


From the definition of c ij and it can be seen that the quantity 

W1+W + - + W 

is the value of the output of the z'th industry needed by all k industries to produce a total output specified by the production vector x. 
Because this quantity is simply the zth entry of the column vector (7x> we can say further that the z'th entry of the column vector 

x — Cx 

is the value of the excess output of the z'th industry available to satisfy the outside demand. The value of the outside demand for the 
output of the z'th industry is the z'th entry of the demand vector d. Consequently, we are led to the following equation 

x — Cx = d 


or 


(1-C)x= d (4) 

for the demand to be exactly met, without any surpluses or shortages. Thus, given C and d, our objective is to find a production vector 
x > 0 that satisfies Equation 4. 

EXAMPLE 5 Production Vector for a Town 

A town has three main industries: a coal-mining operation, an electric power-generating plant, and a local railroad. To 
mine $1 of coal, the mining operation must purchase $.25 of electricity to run its equipment and $.25 of transportation 
for its shipping needs. To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of its own 
electricity to run auxiliary equipment, and $.05 of transportation. To provide $1 of transportation, the railroad requires 
$.55 of coal for fuel and $.10 of electricity for its auxiliary equipment. In a certain week the coal-mining operation 
receives orders for $50,000 of coal from outside the town, and the generating plant receives orders for $25,000 of 
electricity from outside. There is no outside demand for the local railroad. How much must each of the three industries 
produce in that week to exactly satisfy their own demand and the outside demand? 

For the one-week period let 

x\ = value of total output of coal-mining operation 
X2 = value of total output of power-generating plant 
*3 = value of total output of local railroad 
From the information supplied, the consumption matrix of the system is 








The linear system (/ — C)x = d is then 


C = 


0 

.25 

.25 


.65 

.05 

.05 


.55 

.10 

0 


" 1.00 

-.65 - 55“ 

"*r 


'50, 000' 

-.25 

.95 -.10 

*2 

= 

25, 000 

-.25 

-.05 1.00 

*3 


0 


The coefficient matrix on the left is invertible, and the solution is given by 


x=(/-0-*d= 


'756 542 470" 

'50,000" 


'102, 087' 

220 690 190 

25, 000 

= 

56, 163 

200 170 630 

0 


28, 330 


Thus, the total output of the coal-mining operation should be $102,087, the total output of the power-generating plant 
should be $56,163, and the total output of the railroad should be $28,330. 


Let us reconsider Equation 4: 

If the square matrix / _ Q is invertible, we can write 


(/ — C)x = d 


x= (/ — C) -1 d 


(5) 


In addition, if the matrix (/ — C) 1 has only nonnegative entries, then we are guaranteed that for any d > 0, Equation 5 has a unique 

nonnegative solution for x. This is a particularly desirable situation, as it means that any outside demand can be met. The terminology 
used to describe this case is given in the following definition. 


DEFINITION 1 

A consumption matrix C is said to be productive if (/ — C) exists and 

(/-C) _1 > 0 


J 


We will now consider some simple criteria that guarantee that a consumption matrix is productive. The first is given in the following 
theorem. 


Productive Consumption Matrix 

A consumption matrix C is productive if and only if there is some production vector x > 0 such that x > Cx- 


(The proof is outlined in Exercise 9.) The condition x > Cx means that there is some production schedule possible such that each 
industry produces more than it consumes. 

Theorem 10.8.3 has two interesting corollaries. Suppose that all the row sums of C are less than 1. If 

f 

1 

1 


x = 


















then Cx is a column vector whose entries are these row sums. Therefore, x > Cx> and the condition of Theorem 10.8.3 is satisfied. 
Thus, we arrive at the following corollary: 


COROLLARY 10.8.4 

A consumption matrix is productive if each of its row sums is less than 1. 


As we ask you to show in Exercise 8, this corollary leads to the following: 


COROLLARY 10.8.5 

A consumption matrix is productive if each of its column sums is less than 1. 


Recalling the definition of the entries of the consumption matrix C, we see that the y'th column sum of C is the total value of the outputs 
of all k industries needed to produce one unit of value of output of they'th industry. They'th industry is thus said to be profitable if that 
y'th column sum is less than 1. In other words, Corollary 10.8.5 says that a consumption matrix is productive if all k industries in the 
economic system are profitable. 

EXAMPLE 6 Using Corollary 10.8.5 


The consumption matrix in Example 5 was 



.65 

.05 

.05 


.55 

.10 

0 


All three column sums in this matrix are less than 1, so all three industries are profitable. Consequently, by Corollary 
10.8.5, the consumption matrix C is productive. This can also be seen in the calculations in Example 5, as (/ — C) is 
nonnegative. 


Exercise Set 10.8 


1. For the following exchange matrices, find nonnegative price vectors that satisfy the equilibrium condition 3. 


(a) 


(b) 


(c) 


1 1 

2 3 

1 2 

2 3 


1 

0 

1 


2 


2 


1 

0 

1 


3 


2 


1 

1 

0 


6 



.35 


50 

.30 

.25 


20 

.30 

.40 


30 

.40 


Answer: 








(a) 2 

_3 

(b) f6' 

5 

_ 6 _ 

(c) 78 
54 
79 

2 . Using Theorem 10.8.3 and its corollaries, show that each of the following consumption matrices is productive. 

(a) r.8 r 

.3 .6_ 

(b) [.70 .30 .25" 

.20 .40 .25 
.05 .15 .25 

(c) r 7 .3 .2" 

.1 .4 .3 

.2 A . 1 _ 

Answer: 



Answer: 

g'* has all positive entries. 

4 . Three neighbors have backyard vegetable gardens. Neighbor^ grows tomatoes, neighbor B grows corn, and neighbor C grows 
lettuce. They agree to divide their crops among themselves as follows: A gets X of the tomatoes, X of the corn, and -j- of the 

lettuce. B gets X of the tomatoes, X of the corn, and X of the lettuce. C gets X of the tomatoes, X of the corn, X of the lettuce. 

3 3 4 6 3 2 

What prices should the neighbors assign to their respective crops if the equilibrium condition of a closed economy is to be satisfied, 
and if the lowest-priced crop is to have a price of $100? 


Answer: 


Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 

5 . Three engineers—a civil engineer (CE), an electrical engineer (EE), and a mechanical engineer (ME)—each have a consulting firm. 
The consulting they do is of a multidisciplinary nature, so they buy a portion of each others' services. For each $1 of consulting the 
CE does, she buys $.10 of the EE's services and $.30 of the ME's services. For each $1 of consulting the EE does, she buys $.20 of 
the CE's services and $.40 of the ME's services. And for each $1 of consulting the ME does, she buys $.30 of the CE's services and 
$.40 of the EE's services. In a certain week the CE receives outside consulting orders of $500, the EE receives outside consulting 
orders of $700, and the ME receives outside consulting orders of $600. What dollar amount of consulting does each engineer 
perform in that week? 


Answer: 



$1256 for the CE, $1448 for the EE, $1556 for the ME 


(a) Suppose that the demand di for the output of the zth industry increases by one unit. Explain why the zth column of the matrix 
(/ — C) is the increase that must be made to the production vector x to satisfy this additional demand. 

(b) Referring to Example 5, use the result in part (a) to determine the increase in the value of the output of the coal-mining 
operation needed to satisfy a demand of one additional unit in the value of the output of the power-generating plant. 


Answer: 


(b) 


542 

503 


7. Using the fact that the column sums of an exchange matrix E are all 1, show that the column sums of / — g are zero. From this, 
show that / — E has zero determinant, and so (/ — £)p = 0 has nontrivial solutions for p. 

8. Show that Corollary 10.8.5 follows from Corollary 10.8.4. 

[Hint: Use the fact that = ”1) for any invertible matrix A.\ 


9. (Calculus required) Prove Theorem 10.8.3 as follows: 

(a) Prove the “only if’ part of the theorem; that is, show that if C is a productive consumption matrix, then there is a vector x > 0 
such that x > Cx- 

(b) Prove the “if’ part of the theorem as follows: 

Step 1 Show that if there is a vector x* > 0 such that Cx* < x*> then x * > 0- 
Step 2 Show that there is a number X such that 0 < A < 1 and Cx* < Ax*- 
Step 3 Show that C”x* < A”x* for n = 1, 2, .... 

Step 4 Show that C M —► 0 as ^ oo- 
Step 5 By multiplying out, show that 

(/-C)(/ + C + C 2 + ... + C" _1 )=/-C'" 

for n = 1, 2,.... 

Step 6 By letting « —► oo i n Step 5, show that the matrix infinite sum 

s=/ + c + c 2 + ... 

exists and that (1 — C)S = I. 

Step 7 Show that S > 0 and that S = (1 — C) -1 . 

Step 8 Show that C is a productive consumption matrix. 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MAT LAB, Mathematical Maple, 
Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of 
these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Consider a sequence of exchange matrices {, £3, £4, ..., E n ) , where 



S 2 = 


0 

1 


1 

2 

1 

2 


0 


* 3 = 1 


1 

2 

0 

1 

2 


1 

3 

i 

3 

i 

3 


0 


E 4 = 


1 

0 


0 


1 

2 

0 

1 

2 

0 


I 

3 

1 

3 

0 

I 

3 


1 

4 

1 

4 

i 

4 

i 

4 



1 

2 

0 

1 

2 

0 


0 0 


I I 

3 4 

I 1 

3 4 


0 4 


4 0 


o 4 


1 

5 

1 

5 

i 

5 

I 

5 

I 

5 


and so on. Use a computer to show that g* > ^ £3 > 03 , £4 > 04 , > O 5 , and make the conjecture that although is true, 

gk > 0„ is not true for fc = 1, 2, 3,— 1. Next, use a computer to determine the vectors p« such that £VjP m = P« (for ^ = 2 , 3, 4, 
5, 6 ), and then see if you can discover a pattern that would allow you to compute Pw+l easily from p«. Test your discovery by first 
constructing pg from 


P7 = 


2520 

3360 

1890 

672 

175 

36 

7 


and then checking to see whether £gpg = pg. 

T2. Consider an open production model having n industries with « > 1 . In order to produce $1 of its own output, they'th industry must 
spend S (1 / n) for the output of the zth industry (for all i * j), but the yth industry (for all j = 1 , 2 , 3, n) spends nothing for its own 
output. Construct the consumption matrix C M , show that it is productive, and determine an expression for (/^ _ C n ) _1 - In 
determining an expression for — C n ) -1 > use a computer to study the cases when n — 2, 3, 4, and 5; then make a conjecture and 
prove your conjecture to be true. [Hint: If F yi = [ 1 ] (i.e., the nx.n matrix with every entry equal to 1), first show that 

= n ^n 

and then express your value of _ C n ) in terms of n , /„, and F n .~\ 
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10.9 Forest Management 

In this section we discuss a matrix model for the management of a forest where trees are grouped into classes according to height. 
The optimal sustainable yield of a periodic harvest is calculated when the trees of different height classes can have different 
economic values. 


Prerequisites 

Matrix Operations 


Optimal Sustainable Yield 

Our objective is to introduce a simplified model for the sustainable harvesting of a forest whose trees are classified by height. The 
height of a tree is assumed to determine its economic value when it is cut down and sold. Initially, there is a distribution of trees 
of various heights. The forest is then allowed to grow for a certain period of time, after which some of the trees of various heights 
are harvested. The trees left unharvested are to be of the same height configuration as the original forest, so that the harvest is 
sustainable. As we will see, there are many such sustainable harvesting procedures. We want to find one for which the total 
economic value of all the trees removed is as large as possible. This determines the optimal sustainable yield of the forest and is 
the largest yield that can be attained continually without depleting the forest. 


The Model 

Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas trees year after year. Every December 
the harvester cuts down some of the trees to be sold. For each tree cut down, a seedling is planted in its place. In this way the total 
number of trees in the forest is always the same. (In this simplified model, we will not take into account trees that die between 
harvests. We assume that every seedling planted survives and grows until it is harvested.) 

In the marketplace, trees of different heights have different economic values. Suppose that there are n different price classes 
corresponding to certain height intervals, as shown in Table 1 and Figure 10.9.1.The first class consists of seedlings with heights 
in the interval [0, k\), and these seedlings are of no economic value. The ftth class consists of trees with heights greater than or 
equal to 



Figure 10.9.1 


Table 1 











Class 

Value (dollars) 

Height Interval 

1 (seedlings) 

None 

10./.,) 

2 

P 2 

t h v h 2 ) 

3 

Pi 


• 

• 

• 

n- 1 

p„~ i 


n 


tW) 


Let Xj (i = 1, 2,be the number of trees within the zth class that remain after each harvest. We form a column vector with 
the numbers and call it the nonharvest vector. 

■*r 


For a sustainable harvesting policy, the forest is to be returned after each harvest to the fixed configuration given by the 
nonharvest vector x. Part of our problem is to find those nonharvest vectors x for which sustainable harvesting is possible. 


Because the total number of trees in the forest is fixed, we can set 


*1 4*2 4 * ‘ * 4* M =s 


( 1 ) 


where s is predetermined by the amount of land available and the amount of space each tree requires. Referring to Figure 10.9.2, 
we have the following situation. The forest configuration is given by the vector x after each harvest. Between harvests the trees 
grow and produce a new forest configuration before each harvest. A certain number of trees are removed from each class at the 
harvest. Finally, a seedling is planted in place of each tree removed, to return the forest again to the configuration x. 


•5 

i 

o 



Forest after growth 



Trees not removed 


mk 



TT 


Forest before growth 
(nonharvest vector x) 



Same 

forest 

configuration 


Forest after harvest 
{nonharvest vector x) 


-i. 

Li 


1 


tA 


Figure 10.9.2 


Consider first the growth of the forest between harvests. During this period a tree in the zth class may grow and move up to a 

















higher height class. Or its growth may be retarded for some reason, and it will remain in the same class. We consequently define 
the following growth parameters gj for i = 1, 2,— 1: 

gj = the fraction of trees in the ith class that grow into the(z + 1)-st class during a growth period 


For simplicity we assume that a tree can move at most one height class upward in one growth period. With this assumption, we 
have 


With these n 


1 — gi = the fraction of trees in the it h class that remain in the ith class during a growth period 
growth parameters, we form the following ^ x n growth matrix : 


1-gl 0 0 

g\ 1-S2 0 

0 §2 l-g3 


0 

0 

0 


0 0 0 • • • 1 -g„_ 1 0 

0 0 0 • • • g„_i 1 


( 2 ) 


Because the entries of the vector x are the numbers of trees in the n classes before the growth period, you can verify that the 
entries of the vector 


o -gi)*i 

gl*l + (1 -g2)x 2 
g2*2 + (1 — S2) x 2 

2 x n—2 + O — Sn—l') x n—l 
Sn—l x n—l + x n 


(3) 


are the numbers of trees in the n classes after the growth period. 


Suppose that during the harvest we remove (i = 1, 2, n) trees from the zth class. We will call the column vector 

r>r 


y= 


y 2 

y n 


the harvest vector. Thus, a total of 

y\ + 72 + • • • + 7 h 

trees are removed at each harvest. This is also the total number of trees added to the first class (the new seedlings) after each 
harvest. If we define the following « x « replacement matrix 

\ 1 • • • 1 
0 0 • • • 0 


R = 


0 0 


(4) 


then the column vector 


Ry= 


71 +72 + 


0 


+ 7m 


( 5 ) 


specifies the configuration of trees planted after each harvest. 


At this point we are ready to write the following equation, which characterizes a sustainable harvesting policy: 



configuration 

at end of 
growth period 


— [harvest] 4 s 


new seedling 
replacement 


or mathematically, 

This equation can be rewritten as 


Gx-y + £y = x 


configuration 
at beginning of 
growth period 


(I-R)y=(G-I)x 


(6) 


or more comprehensively as 


'o 

-1 

-1 • • 

• -1 

-l" 

" y i 

0 

1 

0 • • 

0 

0 

72 

0 

0 

1 • • 

0 

0 

73 

0 

0 

0 • • 

1 

0 

7m-1 

0 

0 

0 • • 

0 

1 

7m 


-gi 

0 

0 • • 

0 

0 

" *1 

si 

-82 

0 • • 

0 

0 

*2 

0 

82 

-83 • • 

0 

0 

*3 

0 

0 

0 • • 

~8n —1 

0 

* m —1 

0 

0 

0 • • 

8n —1 

0 

*M 


We will refer to Equation 6 as the sustainable harvesting condition. Any vectors x and y with nonnegative entries, and such that 
* 1 + *2 + * " * + = s, which satisfy this matrix equation, determine a sustainable harvesting policy for the forest. Note that 

if y i > 0 ? then the harvester is removing seedlings of no economic value and replacing them with new seedlings. Because there is 
no point in doing this, we assume that 


y\ = o 


(7) 


With this assumption, it can be verified that 6 is the matrix form of the following set of equations: 


72+73+ * * * +7m = 

£1*1 


72 = 

£1*1 “£2*2 


73 = 

£2*2“ £3*3 

(8) 

7m—1 = 

£m—2*m— 2 _ £m—1*m—1 


7m = 

£m—1*m—1 


Note that the first equation in 8 is the sum of the remaining ^ — 

1 equations. 


Because we must have y i > 0 for i = 2, 3Equations 8 require that 


gl*l >g2*2> • • ' 

IV 

0Q 

IV 

o 

(9) 


Conversely, if x is a column vector with nonnegative entries that satisfy Equation 9, then 7 and 8 define a column vector y with 
nonnegative entries. Furthermore, x and y then satisfy the sustainable harvesting condition 6. In other words, a necessary and 
sufficient condition for a nonnegative column vector x to determine a forest configuration that is capable of sustainable 
harvesting is that its entries satisfy 9. 


Optimal Sustainable Yield 

Because we remove y 2 trees from the z'th class (i = 2, 3, n) and each tree in the zth class has an economic value of Pi , the 
total yield of the harvest, Yld , is given by 


Yld = PW2 + P2Y2 + • • • + PrVn 


(10) 
















Using 8, we may substitute for the y 2 -'s in 10 to obtain 


Yld = P2Si x \ + (j>3 - P2)Z2 X 2 + — +(Pn- Pn-l)gn-l x n-l 


( 11 ) 


Combining 11,1, and 9, we can now state the problem of maximizing the yield of the forest over all possible sustainable 
harvesting policies as follows: 

r n 


Problem 


Find nonnegative numbers x\, X2, x n that maximize 

Yld = P2g\*\ + (P2 ~ P2)S2 X 2 + — + “ Pn- l)gw-l*M-l 


subject to 
and 


*1 +*2 + — + = s 


gl*l >g2*2> —>g«-l*w-l >0 


L J 

As formulated above, this problem belongs to the field of linear programming. However, we will illustrate the following result, 
without linear programming theory, by actually exhibiting a sustainable harvesting policy. 


Optimal Sustainable Yield 

The optimal sustainable yield is achieved by harvesting all the trees from one particular height class and none of the trees 
from any other height class. 


Let us first set 

Yldk = yield obtained by harvesting all of the Ath class and none of the other classes 

The largest value of Yld^ for k = 2, 3 ,.n will then be the optimal sustainable yield, and the corresponding value of k will be 
the class that should be completely harvested to attain the optimal sustainable yield. Because no class but the Mi is harvested, we 
have 


y2=73=— =yk-i =yk+i=-=yn = o (i 2 > 

In addition, because all of the Mi class is harvested, no trees are ever present in the height classes above the Mi class. Thus, 

*k = x k+ 1= —= x„ = 0 (13) 

Substituting 12 and 13 into the sustainable harvesting condition 8 gives 

Yk = 

0 = 

0 = 

0 = 
yk = 


g 1*1 

g\x\-g2*2 


2*fc—2 “ gk—\ x k—\ 
gk—\ x k—\ 


(14) 


Equations 14 can also be written as 


yk = 21*1 = g2*2 =... = gk- l*k-l 


(15) 


from which it follows that 


*2 = £ 1*1 I £2 

*3 = £ 1*1 ^£3 


**-1 = £1* 1 / £fc-l 


If we substitute Equations 13 and 16 into 


*1 +*2 + ...+•*« =s 


[which is Equation 1], we can solve for x \ and obtain 

*1 = 


1 + 1L + IL_SL 


£2 £3 £fc-l 

For the yield Yld%, we combine 10, 12, 15, and 17 to obtain 

Yld k =p2y2 + P2y3 + — + P»yrt 
= Pkyk 
=pkg 1*1 

__ PkS _ 

1 


£1 £2 ' £fc-l 


(16) 


(17) 


(18) 


Equation 18 determines Yld^ in terms of the known growth and economic parameters for any k = 2, 3, n. Thus, the optimal 
sustainable yield is found as follows. 


Finding the Optimal Sustainable Yield 

The optimal sustainable yield is the largest value of 

PkS 

— + — H--— 

£l £2 £fc-l 

for k = 2, 3,. n. The corresponding value of k is the number of the class that is completely harvested. 


In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable yield is 


1 /£1 
1 /£2 


x = 


£1 


s 


£2 


+ ... + 


1 

£fc-l 


/ £fc-l 
0 
0 


0 


(19) 


Theorem 10.9.2 implies that it is not necessarily the highest-priced class of trees that should be totally cropped. The growth 
parameters gj must also be taken into account to determine the optimal sustainable yield. 


EXAMPLE 1 Using Theorem 10.9.2 












For a Scots pine forest in Scotland with a growth period of six years, the following growth matrix was found (see 
M. B. Usher, “A Matrix Approach to the Management of Renewable Resources, with Special Reference to Selection 
Forests ” Journal of Applied Ecology, vol. 3, 1966, pp. 355-367): 

’.72 0 0 0 0 0 ’ 

.28 .69 0 0 0 0 

G= 0 .31 .75 0 0 0 

0 0 .25 .77 0 0 

0 0 0 .23 .63 0 

0 0 0 0 .37 1.00 

Suppose that the prices of trees in the five tallest height classes are 

£>2 = $50, />3 = $100, />4=$150, p$ = $200, p$ = $250 


Which class should be completely harvested to obtain the optimal sustainable yield, and what is that yield? 


From matrix G we have that 


gl = .28, 

£2 = 31, 



£2 

= . 

25, 

£4 

= 23. 

Equation 18 then gives 









Yldi 

= 50s/(. 

28" 

‘) = 

14.0s 





yid 3 

= 100s/( 

. 28 

" l + 

. 31 _1 

) = 

: 14.7s 



yid 4 

= 150s/( 

. 28 

-u 

. 31 _1 

+ 

. 25 -1 ) = 

13.9s 


Yld 5 

= 200s/( 

. 28 

- l + 

. 31 -1 

+ 

. 25 -1 + 

. 23 -1 ) = 

13.2s 

yid 6 

= 250s/( 

. 28 

- , + 

. 31 _1 

+ 

. 25 _1 + 

. 23 -1 + 

. 37 _1 ) = 14.0s 


S5 = - 37 


We see that Yld^ is the largest of these five quantities, so from Theorem 10.9.2 the third class should be completely 
harvested every six years to maximize the sustainable yield. The corresponding optimal sustainable yield is $ 14.7s, 
where s is the total number of trees in the forest. 


Exercise Set 10.9 

1. A certain forest is divided into three height classes and has a growth matrix between harvests given by 

0 0 


If the price of trees in the second class is $30 and the price of trees in the third class is $50, which class should be completely 
harvested to attain the optimal sustainable yield? What is the optimal yield if there are 1000 trees in the forest? 

Answer: 

The second class; $15,000 

2. In Example 1, to what level must the price of trees in the fifth class rise so that the fifth class is the one to harvest completely 
in order to attain the optimal sustainable yield? 

Answer: 

$223 

3. In Example 1, what must the ratio of the prices P2 P3 PA P5 P6 i n order that the yields Yld^, k= 2, 3, 4, 5, 6, all be the 







same? (In this case, any sustainable harvesting policy will produce the same optimal sustainable yield. 

Answer: 

1:1.90:3.02:4.24:5.00 

4. Derive Equation 19 for the nonharvest vector x corresponding to the optimal sustainable harvesting policy described in 
Theorem 10.9.2. 

5. For the optimal sustainable harvesting policy described in Theorem 10.9.2, how many trees are removed from the forest 
during each harvest? 

Answer: 

s! (gf 1 +g2 _1 + • • * +gft?i) 

6. If all the growth parameters g\, g 2 ,..., g M -i in the growth matrix G are equal, what should the ratio of the prices 

P2-P3 --- Pn be in order that any sustainable harvesting policy be an optimal sustainable harvesting policy? (See Exercise 3.) 

Answer: 

1:2:3: ■■■:»-! 

Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematical 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


Tl. A particular forest has growth parameters given by 


s ’ = 7 


for i = 1, 2, 3 ,n — 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the Ath height interval is given by 

p k =a{k-\y 

where a is a constant (in dollars) and p is a parameter satisfying 1 < p < 2. 

(a) Show that the yield Yld^ is given by 


(b) For 

p= 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 

use a computer to determine the class number that should be completely harvested, and determine the optimal sustainable 
yield in each case. Make sure that you allow k to take on only integer values in your calculations. 

(c) Repeat the calculations in part (b) using 

p= 1.91, 1.92, 1.93, 1.94, 1.95, 

1.96, 1.97, 1.98, 1.99 


(d) Show that if p = 2, then the optimal sustainable yield can never be larger than 2 as. 

(e) Compare the values of k determined in parts (b) and (c) to 1 / (2 — /?), and use some calculus to explain why 


k ~ 


1 

2 -P 


T2. A particular forest has growth parameters given by 





2 J 


for i = 1, 2, 3,1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the kt h height interval is given by 


p k =a(k- l) p 

where a is a constant (in dollars) and p is a parameter satisfying 1 < p. 

(a) Show that the yield Yld £ is given by 


Yld k 


a(k~ 1 ) p s 
2 k -2 


(b) For 


P= 1,2, 3,4,5, 6,7, 8, 9, 10 


use a computer to determine the class number that should be completely harvested in order to obtain an optimal yield, and 
determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your 
calculations. 


(c) Compare the values of k determined in part (b) to 1 + p / ln(2) and use some calculus to explain why 




P 

ln(2) 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.10 Computer Graphics 

In this section we assume that a view of a three-dimensional object is displayed on a video screen and show 
how matrix algebra can be used to obtain new views of the object by rotation, translation, and scaling. 


Prerequisites 

Matrix Algebra 
Analytic Geometry 


Visualization of a Three-Dimensional Object 

Suppose that we want to visualize a three-dimensional object by displaying various views of it on a video 
screen. The object we have in mind to display is to be determined by a finite number of straight line segments. 
As an example, consider the truncated right pyramid with hexagonal base illustrated in Figure 10.10.1. We first 
introduce an xyz-coordinate system in which to embed the object. As in Figure 10.10.1, we orient the coordinate 
system so that its origin is at the center of the video screen and the xy-plane coincides with the plane of the 
screen. Consequently, an observer will see only the projection of the view of the three-dimensional object onto 
the two-dimensional xy-plane. 


A >’ 



Figure 10.10.1 

In the xyz-coordinate system, the endpoints P\, Pyi °f the straight line segments that determine the view 

of the object will have certain coordinates—say, 

(x 2 ,y2,Z2) . 

These coordinates, together with a specification of which pairs are to be connected by straight line segments, 



are to be stored in the memory of the video display system. For example, assume that the 12 vertices of the 
truncated pyramid in Figure 10.10.1 have the following coordinates (the screen is 4 units wide by 3 units high): 


P \\ ( 1 . 000 , - . 800 , . 000 ), 

■ P 3 : (— . 500 , - . 800 , - . 866 ), 
P 5 :(-. 500 , - . 800 , . 866 ), 
P - j : (. 840 , - . 400 , . 000 ), 

Pg : (— . 210 , . 650 , - . 364 ), 
P \\:( — . 210 , . 650 , . 364 ), 


P 2 :(. 500 , — . 800 , - . 866 ), 
P 4 :(- 1 . 000 , - . 800 , . 000 ), 
P 6 :(. 500 , - . 800 , . 866 ), 

P %\ (. 315 , . 125 , -. 546 ), 

P 10 ; ( — . 360 , . 800 , . 000 ), 
P \ 2 '. ( 315 , . 125 , . 546 ) 


These 12 vertices are connected pairwise by 18 straight line segments as follows, where Pi <-► Pj denotes that 
point Pi is connected to point Py. 

P\++P 2 , P 2 ++P2> Pz<-*P 4 > P4++ P 5 > P5++P6, P 6 ~P U 

P-J++P2, Pz^+Py, P9 <=>P\q, P\Q++Pn> ^ 11 ^^ 12 , P 12 

P\++Pj, P 2 *-+ P%, P 2 «-> P9> P 4 ++ P \Qy P5++P 11 , P6 

In View 1 these 18 straight line segments are shown as they would appear on the video screen. It should be 
noticed that only the x- and y-coordinates of the vertices are needed by the video display system to draw the 
view, because only the projection of the object onto the xy-plane is displayed. However, we must keep track of 
the z-coordinates to carry out certain transformations discussed later. 


View 1 


We now show how to form new views of the object by scaling, translating, or rotating the initial view. We first 
construct a 3 x n matrix P , referred to as the coordinate matrix of the view , whose columns are the coordinates 
of the n points of a view: 


P = 


X\ X 2 — Xfl 

y 1 72 — 7 m 

^1 z 2 --- Z Yl 


For example, the coordinate matrix P corresponding to View 1 is the 3 x 12 matrix 


1.000 

.500 

-.500 

— 1.000 

-.500 

.500 

.840 

.315 

-.210 

-.360 

-.210 

.315 

-.800 

-.800 

-.800 

-.800 

-.800 

-.800 

-.400 

.125 

.650 

.800 

.650 

.125 

.000 

-.866 

-.866 

.000 

.866 

.866 

.000 

-.546 

-.364 

.000 

.364 

.546 


We will show below how to transform the coordinate matrix P of a view to a new coordinate matrix P ? 
corresponding to a new view of the object. The straight line segments connecting the various points move with 
the points as they are transformed. In this way, each view is uniquely determined by its coordinate matrix once 
we have specified which pairs of points in the original view are to be connected by straight lines. 






Scaling 


The first type of transformation we consider consists of scaling a view along the x, y, and z directions by factors 
of a, P, and y, respectively. By this we mean that if a point P 2 has coordinates (*. ? y i 7 z{) i n the original view, it 
is to move to a new point P' with coordinates (oxj , , 'yZj) in the new view. This has the effect of 

transforming a unit cube in the original view to a rectangular parallelepiped of dimensions a x ft x 7 (Figure 
10.10.2). Mathematically, this may be accomplished with matrix multiplication as follows. Define a 3 x 3 
diagonal matrix 


S = 


a 0 
0 0 
0 0 


0 

0 

7 


Then, if a point P 2 in the original view is represented by the column vector 

yi 

then the transformed point P' is represented by the column vector 


r/i 






a 0 0 

~*i~ 

* 

= 

0 p 0 

yi 



-1 

O 

O 

_l 

Zi 


Using the coordinate matrix P, which contains the coordinates of all n points of the original view as its columns, 
we can transform these n points simultaneously to produce the coordinate matrix P f of the scaled view, as 
follows: 


a 0 

0 

'*1 

*2 - 

- *n 

0 0 

0 

y 1 

yi - 

- yn 

0 0 

7 

21 

22 - 

■ z n 

0*1 

o*2 — 

Q*„‘ 


@y\ 

&2 - 


= P' 

7^1 

722 





The new coordinate matrix can then be entered into the video display system to produce the new view of the 
object. As an example, View 2 is View 1 scaled by setting & = jg, .1 = 0.5, and * = 3.0. Note that the scaling 
y = 3.0 along the z-axis is not visible in View 2, since we see only the projection of the object onto the 
xy-plane. 




















Figure 10.10.2 


View 1 scaled by 0 ;= 1 g, 0.5, 7 = 3.0 


Translation 

We next consider the transformation of translating or displacing an object to a new position on the screen. 
Referring to Figure 10.10.3, suppose we desire to change an existing view so that each point P 2 with 
coordinates z{) moves to a new point P ! - with coordinates (x^ + xq, yi + 70 , z i + z o)* The vector 

"* 0 _ 

70 

z 0 

is called the translation vector of the transformation. By defining a 3 x n matrix T as 








T = 


*0 *0 *0 

yo yo — 7o 

ZQ ZQ ... ZQ 

we can translate all n points of the view determined by the coordinate matrix P by matrix addition via the 
equation 

P ' = P + 7 

The coordinate matrix P f then specifies the new coordinates of the n points. For example, if we wish to 
translate View 1 according to the translation vector 

" 1 . 2 " 

0.4 

1.7 

the result is View 3. Note, again, that the translation zq = 1.7 along the z-axis does not show up explicitly in 
View 3. 


View 1 translated by = 1.2, y q = 0.4, zq = 1.7 . 



In Exercise 7, a technique of performing translations by matrix multiplication rather than by matrix addition is 
explained. 


Rotation 

A more complicated type of transformation is a rotation of a view about one of the three coordinate axes. We 
begin with a rotation about the z-axis (the axis perpendicular to the screen) through an angle 0. Given a point P 2 
in the original view with coordinates y ir z 2 ), we wish to compute the new coordinates ( xyf, zf) of the 








rotated point P'. Referring to Figure 10.10.4 and using a little trigonometry, you should be able to derive the 
following: 

xj = p cos($ + 0) =p cos $ cos 0 — p sin (& sin0 = x 2 cos 0 — y* sm0 
y[ = p sin($ + 6) = p cos <& sin 9 4- p sin cos 0 = x 2 * sin 9 + y 2 cos 0 



These equations can be written in matrix form as 



cos 0 

—sin# 

0‘ 

~*i~ 

= 

sin# 

cos # 

0 

yi 


0 

0 

1 

Zi 


If we let R denote the 3 x 3 matrix in this equation, all n points can be rotated by the matrix product P r = RP to 
yield the coordinate matrix P f of the rotated view. 



Figure 10.10.4 

Rotations about the x- and y-axes can be accomplished analogously, and the resulting rotation matrices are 
given with Views 4, 5, and 6. These three new views of the truncated pyramid correspond to rotations of View 1 
about the x-, y- 9 and z-axes, respectively, each through an angle of 90°. 

Rotation about the r-axis 

1 0 0 
0 cos 6 -sin# 

0 sin 0 cos 0 



View 1 rotated 90° about the x-axis 















Rotation about the v-axis 

4 ? 



cos 0 0 sin 0 

0 1 0 

-sin 0 0 cos 0 


View 1 rotated 90° about they-axis. 


Rotation about the c-axis 

4 * 



cos 0 -sin 0 0 

sin 0 cos 0 0 

0 0 I 


View 1 rotated 90° about the z-axis. 


Rotations about three coordinate axes may be combined to give oblique views of an object. For example, View 
7 is View 1 rotated first about the x-axis through 30°, then about they-axis through _7Q°, and finally about the 
z-axis through — 27°- Mathematically, these three successive rotations can be embodied in the single 
transformation equation P f = RP, where R is the product of three individual rotation matrices: 


in the order 


*1 = 

'l 0 0 

0 cos(30°) —sin(30°) 

0 sin(30°) cos(30°) 


*2 = 

cos( — 70°) 0 sin( — 70°) 

0 1 0 

—sin( — 70°) 0 cos( — 70°) 

R 3 = 

"cos( — 27°) —sin( — 27°) O' 
sin( — 27°) cos( — 27°) 0 

0 0 1 

.305 -.025 - 
R = R 3 R 2 Ri= -.155 .985 -. 

.940 .171 

952' 

076 

296 


















Oblique view of truncated pyramid. 


As a final illustration, in View 8 we have two separate views of the truncated pyramid, which constitute a 
stereoscopic pair. They were produced by first rotating View 7 about the y-axis through an angle of _ 3 ° and 
translating it to the right, then rotating the same View 7 about the y-axis through an angle of | 3° and 
translating it to the left. The translation distances were chosen so that the stereoscopic views are about 27 - 

inches apart—the approximate distance between a pair of eyes. 


Stereoscopic figure of truncated pyramid. The three-dimensionality of the diagram can be seen 
by holding the book about one foot away and focusing on a distant object. Then by shifting your 
gaze to View 8 without refocusing, you can make the two views of the stereoscopic pair merge 
together and produce the desired effect. 


Exercise Set 10.10 


1. View 9 is a view of a square with vertices (0, 0, 0), (1, 0, 0), (1, 1,0), and (0, 1,0). 

(a) What is the coordinate matrix of View 9? 

(b) What is the coordinate matrix of View 9 after it is scaled by a factor I 7 - in the v-direction and -i in the 
y-direction? Draw a sketch of the scaled view. 

(c) What is the coordinate matrix of View 9 after it is translated by the following vector? 


-2 

-1 

3 


Draw a sketch of the translated view. 




(d) What is the coordinate matrix of View 9 after it is rotated through an angle of —30° about the z-axis? 
Draw a sketch of the rotated view. 


Square with vertices (0, 0, 0), (1, 0, 0), (1, 1,0), and (0, 1, 0) (Exercises 1 and 2) 


Answer: 


(a) 


0 110 
0 0 11 
0 0 0 0 


(b) 


0 

0 


3 3 
2 2 


0 0 


0 

1 

2 

0 



0 .866 1.366 .500 

0 -.500 .366 .866 

0 0 0 0 


• (a) If the coordinate matrix of View 9 is multiplied by the matrix 


0 1 0 
0 0 1 


the result is the coordinate matrix of View 10. Such a transformation is called a shear in the x-direction 
with factor with respect to the y-coordinate. Show that under such a transformation, a point with 

coordinates (x J; y }> z,) has new coordinates (x,- 4- -iy,, y ; -, z,). 


(b) What are the coordinates of the four vertices of the shear square in View 10? 












(c) The matrix 


1 0 0 
.6 1 0 
0 0 1 

determines a shear in the y-direction with factor .6 with respect to the x-coordinate (an example appears 
in View 11). Sketch a view of the square in View 9 after such a shearing transformation, and find the 
new coordinates of its four vertices. 


View 9 sheared along the x-axis by — with respect to the y-coordinate (Exercise 2) 


View 1 sheared along they-axis by .6 with respect to the x-coordinate (Exercise 2). 

Answer: 

(b) 

(0,0,0), (1,0,0), (l^l.o), and^.l.o) 

(c) ( 0 , 0 , 0 ), ( 1 , . 6 , 0 ), ( 1 , 16 , 0 ), ( 0 , 1 , 0 ) 

• (a) The reflection about the xz-plane is defined as the transformation that takes a point (x 2 , y 3 , z{) to the 
point (Xj, — y ir Zj) ( e -g- ? View 12). If P and P f are the coordinate matrices of a view and its reflection 
about the xz-plane, respectively, find a matrix M such that P ,r = MP- 

(b) Analogous to part (a), define the reflection about theyz-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about theyz-plane. 

(c) Analogous to part (a), define the reflection about the xy-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about the xy-plane. 




View 1 reflected about the xz-plane (Exercise 3). 


Answer: 


(a) 


1 0 0 

0-10 

0 0 1 


0 1 
0 0 


0 

0 

1 


(c) 


1 0 0 

0 1 0 

0 0-1 


(a) View 13 is View 1 subject to the following five transformations: 


1 • Scale by a factor of in the x-direction, 2 in the r-direction, and 4 in the z-direction. 

2- Translate 4- unit in the x-direction. 

2 

3. Rotate 20 ° about the x-axis. 

4. Rotate— 45 ° about the y-axis. 

5. Rotate 90° about the z-axis. 

Construct the five matrices M\,M 2 , M 3 , M 4 , and M 5 associated with these five transformations. 

(b) If P is the coordinate matrix of View 1 and P‘ is the coordinate matrix of View 13, express P f in terms 
of Mi, M 2 , M 3 , M 4 , M 5 , andR. 








View 1 scaled, translated, and rotated (Exercise 4) 


Answer: 


(a) 


M i = 


\ ° 
0 2 

0 0 



' 1 

1 

1 ' 


O 

O 


2 

2 

2 


0 0 

, M2 = 

0 

0 • • • 

0 

, M3 = 

0 cos 20 —sin 20 


0 

0 • • • 

0 


0 sin 20 cos 20 

_ 






cos(— 45 ) 

0 

1 

O 

m 

1 

s 

CO 


'0 

-1 

o' 

m 4 = 

0 

1 

0 

, m 5 = 

1 

0 

0 


—sin ( — 45 ) 

0 

cos( — 45 ) 


0 

0 

1 


(b) P f = + M2) 


• (a) View 14 is View 1 subject to the following seven transformations: 


1. Scale by a factor of .3 in the x-direction and by a factor of .5 in the y-direction. 

2. Rotate 45° about the x-axis. 

3. Translate 1 unit in the x-direction. 

4. Rotate 35° about they-axis. 

5. Rotate -45° about the z-axis. 

6. Translate 1 unit in the z-direction. 

7. Scale by a factor of 2 in the x-direction. 

Construct the matrices M\, M2, Mj associated with these seven transformations. 

(b) If P is the coordinate matrix of View 1 and P‘ is the coordinate matrix of View 14, express P r in terms 
of M\, M2,Mj 9 and P. 












View 1 scaled, translated, and rotated (Exercise 5). 


Answer: 


(a) 

".3 

O 

O 


1 

0 

0 


'1 1 • • • 

f 

Mi = 

0 

.5 0 

* 

to 

II 

0 

o 

cos 45 

—sin 45 

. m 3 = 

o 

o 

0 


0 

0 1 


0 

sin 45 

cos 45 


o 

o 

0 


cos 

o 

35 

0 sin 35 

o 


o 

cos ( — 45 ) —sin 

1- 

o 

o 

i 



M a = 


0 

—sin 35 


1 

0 


0 

cos 35 


M 5 = 




J 

_ 


o 

o 

o 


'2 0 O' 

m 6 = 

0 0 • • • 0 

, m 7 = 

0 1 0 


11 • • • 1 


0 0 1 


sin (-45 ) 
0 


cos( — 45 ) 
0 


(b) P* = \P + M 3 ) + M$) 


6. Suppose that a view with coordinate matrix P is to be rotated through an angle 0 about an axis through the 
origin and specified by two angles a and p (see Figure Ex- 6 ). If P ! is the coordinate matrix of the rotated 
view, find rotation matrices /?i, R 2 , £ 3 , R 4 , and R$ such that 

P'= R 5 R4R 3 R 2 R\P 

[Hint: The desired rotation can be accomplished in the following five steps: 

1. Rotate through an angle of P about the jy-axis. 

2. Rotate through an angle of a about the z-axis. 

3. Rotate through an angle of 0 about the jy-axis. 

4. Rotate through an angle of-a about the z-axis. 

5. Rotate through an angle of -p about the jy-axis.] 



Answer: 




















cos 

0 

sin/? 


cos a 

—sin a 

0 

R { = 

0 

1 

0 

. *2 = 

sin a 

cos a 

0 


—sin 

0 

cos $ 


0 

0 

1 


cos 6 

0 

sin 9 


cos a 

sin a; 

o' 

r 3 = 

0 

1 

0 

i?4 = 

—sin a 

cos a 

0 


—sin 9 

0 

cos 9 


0 

0 

1 


cos 0 

0 

— sin/? 





R 5 = 

0 

1 

0 






sin ,3 

0 

cos /? 






7. This exercise illustrates a technique for translating a point with coordinates (x } , y it Zj ) to a point with 
coordinates (x, | xg, | y^,Zj I zg ) by matrix multiplication rather than matrix addition. 

(a) Let the point (*,, y u z } ) be associated with the column vector 

yi 


v, = 


Zj 


1 


and let the point (x, | xg, _y ; 4- _yg, z, I zg) be associated with the column vector 



Xj + xg 

yi+yo 

Zj + zg 


1 


Find a 4 x 4 matrix M such that vj = M\j. 

(b) Find the specific 4x4 matrix of the above form that will effect the translation of the point (4, — 2, 3) 
to the point ( — 1, 7, 0). 


Answer: 


(a) 


M = 


1 0 0 xg 

0 1 0 yo 
0 0 1 zg 
0 0 0 1 


(b) 


10 0-5 
0 10 9 
0 0 1-3 
0 0 0 1 


8 . For the three rotation matrices given with Views 4, 5, and 6, show that 




(A matrix with this property is called an orthogonal matrix. See Section 7.1.) 



Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a 
basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you 
will be able to use your technology utility to solve many of the problems in the regular exercise sets. 


Tl. Let ( a , b 7 c ) be a unit vector normal to the plane ax + by + cz = 0? and let r = (x, y 7 z) be a vector. It 
can be shown that the mirror image of the vector r through the above plane has coordinates 
l 'm = ( x m> ym> z m)> w b ere 


x m 


"x" 

ym 

= M 

y 

z m 


z 


with 


M = /-2nn r = 


'1 

0 

o' 


~a~ 

0 

l 

0 

-2 

b 

0 

0 

1 


c 


[a b c] 


(a) Show that = I and give a physical reason why this must be so. [Hint: Use the fact that {a, b 7 c ) is a 
unit vector to show that n J n = ].] 


(b) Use a computer to show that det( M ) = — 1. 

(c) The eigenvectors of M satisfy the equation 


ym 

= M 

1_ 

= A 

1- 

* 

1_ 

z m 


z 


z 


and therefore correspond to those vectors whose direction is not affected by a reflection through the plane. 
Use a computer to determine the eigenvectors and eigenvalues of M, and then give a physical argument to 
support your answer. 


T2. A vector v = (x, y 7 z) is rotated by an angle 0 about an axis having unit vector ( a 7 b 7 c ), thereby forming 
the rotated vector = (xr 7 yj ^ 7 zr) • It can be shown that 


~XR~ 


"x" 

yR 

ZR 

=m 

y 

z 


with 



1 0 0 



~a~ 


R(0) = cos(0) 

0 1 0 

+ (l-cos(0)) 

b 

[a b 


0 0 1 



c 




0 ■ 

-c 

b 


4- sin(0) 

c 

0 

—a 



-b 

a 

0 


c] 


























(a) Use a computer to show that R(0)R(ip) = R(0 + ip), and then give a physical reason why this must be so. 
Depending on the sophistication of the computer you are using, you may have to experiment using different 
values of a , b , and 

c — 

(b) Show also that R (0) = R( — 9) and give a physical reason why this must be so. 

(c) Use a computer to show that det(/?(0)) = + 1. 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 




10.11 Equilibrium Temperature Distributions 

In this section we will see that the equilibrium temperature distribution within a trapezoidal plate can be found 
when the temperatures around the edges of the plate are specified. The problem is reduced to solving a system of 
linear equations. Also, an iterative technique for solving the problem and a “random walk” approach to the 
problem are described. 


Prerequisites 

Linear Systems 
Matrices 

Intuitive Understanding of Limits 


Boundary Data 


Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.11.1a are insulated from heat. Suppose 
that we are also given the temperature along the four edges of the plate. For example, let the temperature be 
constant on each edge with values of 0°, Q°, 1°, and 2°, as in the figure. After a period of time, the temperature 
inside the plate will stabilize. Our objective in this section is to determine this equilibrium temperature distribution 
at the points inside the plate. As we will see, the interior equilibrium temperature is completely determined by the 
boundary data —that is, the temperature along the edges of the plate. 



Figure 10.11.1 

The equilibrium temperature distribution can be visualized by the use of curves that connect points of equal 
temperature. Such curves are called isotherms of the temperature distribution. In Figure 10.11.1Z? we have 
sketched a few isotherms, using information we derive later in the chapter. 









Although all our calculations will be for the trapezoidal plate illustrated, our techniques generalize easily to a plate 
of any practical shape. They also generalize to the problem of finding the temperature within a three-dimensional 
body. In fact, our “plate” could be the cross section of some solid object if the flow of heat perpendicular to the 
cross section is negligible. For example, Figure 10.11.1 could represent the cross section of a long dam. The dam is 
exposed to three different temperatures: the temperature of the ground at its base, the temperature of the water on 
one side, and the temperature of the air on the other side. A knowledge of the temperature distribution inside the 
dam is necessary to determine the thermal stresses to which it is subjected. 

Next we will consider a certain thermodynamic principle that characterizes the temperature distribution we are 
seeking. 


The Mean-Value Property 

There are many different ways to obtain a mathematical model for our problem. The approach we use is based on 
the following property of equilibrium temperature distributions. 


The Mean-Value Property 

Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is any circle with 
center at P that is completely contained in the plate, the temperature at P is the average value of the 
temperature on the circle (Figure 10.11.2). 



This property is a consequence of certain basic laws of molecular motion, and we will not attempt to derive it. 
Basically, this property states that in equilibrium, thermal energy tends to distribute itself as evenly as possible 
consistent with the boundary conditions. It can be shown that the mean-value property uniquely determines the 
equilibrium temperature distribution of a plate. 

Unfortunately, determining the equilibrium temperature distribution from the mean-value property is not an easy 
matter. However, if we restrict ourselves to finding the temperature only at a finite set of points within the plate, 
the problem can be reduced to solving a linear system. We pursue this idea next. 





Discrete Formulation of the Problem 


We can overlay our trapezoidal plate with a succession of finer and finer square nets or meshes (Figure 10.11.3). In 
(a) we have a rather coarse net; in ( b ) we have a net with half the spacing as in (a); and in ( c ) we have a net with 
the spacing again reduced by half. The points of intersection of the net lines are called mesh points. We classify 
them as boundary mesh points if they fall on the boundary of the plate or as interior mesh points if they lie in the 
interior of the plate. For the three net spacings we have chosen, there are 1, 9, and 49 interior mesh points, 
respectively. 

2 2 

2 

2 0 2 

2 
tn 

2 0 0 2 

2 

111 I 

(a) 1 interior mesh point (6) 

Figure 10.11.3 
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9 interior mesh points 
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2 0 
2 0 
2 0 
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2 0 
2 0 

2 0 

2 0 

2 0 

I I I I 1 I 1 I I 

(c) 49 interior mesh points 


In the discrete formulation of our problem, we try to find the temperature only at the interior mesh points of some 
particular net. For a rather fine net, as in (c), this will provide an excellent picture of the temperature distribution 
throughout the entire plate. 

At the boundary mesh points, the temperature is given by the boundary data. (In Figure 10.11.3 we have labeled all 
the boundary mesh points with their corresponding temperatures.) At the interior mesh points, we will apply the 
following discrete version of the mean-value property. 


Discrete Mean-Value Property 

At each interior mesh point, the temperature is approximately the average of the temperatures at the four 
neighboring mesh points. 


This discrete version is a reasonable approximation to the true mean-value property. But because it is only an 
approximation, it will provide only an approximation to the true temperatures at the interior mesh points. However, 
the approximations will get better as the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the 
approximations approach the exact temperature distribution, a fact proved in advanced courses in numerical 
analysis. We will illustrate this convergence by computing the approximate temperatures at the mesh points for the 
three mesh spacings given in Figure 10.11.3. 


Case (a) of Figure 10.11.3 is simple, for there is only one interior mesh point. If we let £q be the temperature at this 


mesh point, the discrete mean-value property immediately gives 

* 0 = i (2 + 1 + 0 + 0 ) = 0.75 

In case ( b ) we can label the temperatures at the nine interior mesh points t\, *2> *9, as in Figure 10.11.36. (The 

particular ordering is not important.) By applying the discrete mean-value property successively to each of these 
nine mesh points, we obtain the following nine equations: 

t\ = ^(*2 + 2 + 0 + 0) 

*2 = ^(*1 + *3 + * 4 + 2 ) 

*3 = —(*2 +*5 + 0 + 0) 

*4= -i(*2 +*5 + ^7 + 2) 

*5 = ^(*3+*4 +*6+*8) (1) 

t(, = ^(*5 +*9 + 0 + 0) 
t -] = ^-(£4 + ig +1+2) 
t% = ^-(^+*7+^9 + 1) 
l 9 = ^(^6 + *8 + 1 + 0 ) 

This is a system of nine linear equations in nine unknowns. We can rewrite it in matrix form as 


where 


t = Mt + b 


t = 


h 

*2 

*3 

t 4 

h 
16 
*7 
*8 
tg 


0 

i 

4 

0 

0 


M = 


0 


0 


0 


4 0 0 


0 

0 

0 

0 


1 1 
4 4 


4 oo 


4oo 


I 1 
4 4 


0 

0 

0 


1 

4 


0000 


0 

0 

I 

4 

I 

4 

0 

1 

4 

0 

1 

4 

0 


0 

0 

0 

0 

1 

4 

0 

0 

0 

1 

4 


0 

0 

0 

I 

4 

0 

0 

0 

1 

4 

0 


0 

0 

0 

0 

1 

4 

0 

1 

4 

0 

1 

4 


0 

0 

0 

0 

0 

1 

4 

0 

1 

4 

0 



( 2 ) 


To solve Equation 2, we write it as 



The solution for t is thus 


(/ — M)t = b 


t= (/-M) _1 b 


( 3 ) 


as long as the matrix (/ — M) is invertible. This is indeed the case, and the solution for t as calculated by 3 is 

0.7846 
1.1383 
0.4719 
1.2967 

t= 0.7491 ( 4 ) 

0.3265 
1.2995 
0.9014 
0.5570 

Figure 10.11.4 is a diagram of the plate with the nine interior mesh points labeled with their temperatures as given 
by this solution. 


0.7846 0 


1.1383 0.4719 0 


2 1.2967 0.7491 0.3265 0 

2 1.2995 0.9014 0.5570 0 


11111 

Figure 10.11.4 

For case ( c ) of Figure 10.11.3, we repeat this same procedure. We label the temperatures at the 49 interior mesh 
points as t\, £49 in some manner. For example, we may begin at the top of the plate and proceed from left to 

right along each row of mesh points. Applying the discrete mean-value property to each mesh point gives a system 
of 49 linear equations in 49 unknowns: 




(5) 


h = jte + 2 + 0 + 0) 
t2 = |(*l+*3 + *4 + 2) 

^48 = ^(^ 41 +^ 47 +^ 49 + 1 ) 

^49 = ^'(■^42 + ^48 + 0 + 1 ) 

In matrix form, Equations 5 are 


t = Mt + b 

where t and b are column vectors with 49 entries, and Mis a 49 x 49 matrix. As in 3, the solution for t is 


t= (/-ji/) _1 b 


( 6 ) 


In Figure 10.11.5 we display the temperatures at the 49 mesh points found by Equation 6. The nine unshaded 
temperatures in this figure fall on the mesh points of Figure 10.11.4. 
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Figure 10.11.5 


In Table 1 we compare the temperatures at these nine common mesh points for the three different mesh spacings 
used. 


Table 1 



Temperature* at Common 
Mesh Points 


Case (tf) 

Case ( b ) 

Case (c) 

h 

_ 

0.7846 

0.8048 

/> 

— 

1.1383 

1.1533 

h 

— 

0.4719 

0.4778 

U 

— 

1.2967 

1.3078 

h 

0.7500 

0.7491 

0.7513 

t,6 

— 

0.3265 

0.3157 

h 

— 

1.2995 

1.3042 

h 

— 

0.9014 

0.9032 

b 

— 

0.5570 

0.5554 


Knowing that the temperatures of the discrete problem approach the exact temperatures as the mesh spacing 
decreases, we may surmise that the nine temperatures obtained in case (c) are closer to the exact values than those 
in case (b). 


A Numerical Technique 

To obtain the 49 temperatures in case ( c ) of Figure 10.11.3, it was necessary to solve a linear system with 49 
unknowns. A finer net might involve a linear system with hundreds or even thousands of unknowns. Exact 
algorithms for the solutions of such large systems are impractical, and for this reason we now discuss a numerical 
technique for the practical solution of these systems. 

To describe this technique, we look again at Equation 2: 

t = Mt -\ b (7) 

The vector t we are seeking appears on both sides of this equation. We consider a way of generating better and 
better approximations to the vector solution t. For the initial approximation we can take = 0 if no better 
choice is available. If we substitute into the right side of 7 and label the resulting left side as t (0 , we have 


tV = Mt<® + b 


( 8 ) 


If we substitute t 111 into the right side of 7, we generate another approximation, which we label t' A 1 : 


t® = Mt (1) + b 


( 9 ) 


Continuing in this way, we generate a sequence of approximations as follows: 











t® = Mt® + b 

t® = Mt® + b 

t® = Mt®4b (10) 

t (») = + h 

One would hope that this sequence of approximations t®, t®, t®, ... converges to the exact solution of 7. We do 

not have the space here to go into the theoretical considerations necessary to show this. Suffice it to say that for the 
particular problem we are considering, the sequence converges to the exact solution for any mesh size and for any 
initial approximation t®. 


This technique of generating successive approximations to the solution of 7 is a variation of a technique called 
Jacobi iteration ; the approximations themselves are called iterates. As a numerical example, let us apply Jacobi 
iteration to the calculation of the nine mesh point temperatures of case (b). Setting j® = 0, we have, from 
Equation 2, 

r .5000 
.5000 
.0000 
.5000 
.0000 
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.7500 
.2500 
.2500 


t® = Mt® 4b = MO 4 b =b = 


t® = Mt® 4 b 
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.5000 


.5000 


.6250 

.5000 


.5000 


.7500 

.0000 


.0000 


.1250 

.5000 


.5000 


.8125 

.0000 

+ 

.0000 

= 

.1875 

.0000 


.0000 


.0625 

.7500 


.7500 


.9375 

.2500 


.2500 


.5000 

.2500 


.2500 


.3125 







Some additional iterates are 



0.6875 


0.7791 


0.7845 


0.7846 

0.8906 


1.1230 


1.1380 


1.1383 

0.2344 


0.4573 


0.4716 


0.4719 

0.9688 

t GCD = 

1.2770 

t W» = 

1.2963 

t^ = 

1.2967 

0.3750 

0.7236 

0.7486 

0.7491 

0.1250 


0.3131 


0.3263 


0.3265 

1.0781 


1.2848 


1.2992 


1.2995 

0.6094 


0.8827 


0.9010 


0.9014 

0.3906 


0.5446 


0.5567 


0.5570 


All iterates beginning with the thirtieth are equal to t 1 JlU 1 to four decimal places. Consequently, is the exact 
solution to four decimal places. This agrees with our previous result given in Equation 4. 

The Jacobi iteration scheme applied to the linear system 5 with 49 unknowns produces iterates that begin repeating 
to four decimal places after 119 iterations. Thus, would provide the 49 temperatures of case (c) correct to 
four decimal places. 


A Monte Carlo Technique 

In this section we describe a so-called Monte Carlo technique for computing the temperature at a single interior 
mesh point of the discrete problem without having to compute the temperatures at the remaining interior mesh 
points. First we define a discrete random walk along the net. By this we mean a directed path along the net lines 
(Figure 10.11.6) that joins a succession of mesh points such that the direction of departure from each mesh point is 
chosen at random. Each of the four possible directions of departure from each mesh point along the path is to be 
equally probable. 
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Figure 10.11.6 

By the use of random walks, we can compute the temperature at a specified interior mesh point on the basis of the 
following property. 


Random Walk Property 










Let W\, W 2 , W n be a succession of random walks, all of which begin at a specified interior mesh point. 
Let t * 7 1 * y .. t * be the temperatures at the boundary mesh points first encountered along each of these 
random walks. Then the average value (£* f -1 * } ... f t *) / n of these boundary temperatures approaches 
the temperature at the specified interior mesh point as the number of random walks n increases without 
bound. 


This property is a consequence of the discrete mean-value property that the mesh point temperatures satisfy. The 
proof of the random walk property involves elementary concepts from probability theory, and we will not give it 
here. 


In Table 2 we display the results of a large number of computer-generated random walks for the evaluation of the 
temperature of the nine-point mesh of case ( b ) in Figure 10.11.6. The first column lists the number n of the 

random walk. The second column lists the temperature 1 * of the boundary point first encountered along the 
corresponding random walk. The last column contains the cumulative average of the boundary temperatures 
encountered along the n random walks. Thus, after 1000 random walks we have the approximation ~ .7550. 
This compares with the exact value ^ = .7491 that we had previously evaluated. As can be seen, the convergence 
to the exact value is not too rapid. 

Table 2 


n 


(f\+- + 0/n 

20 

1 

0.9500 

30 

0 

0.8000 

40 

0 

0.8250 

50 

2 

0.8400 

100 

0 

0.8300 

150 

1 

0.8000 

200 

0 

0.8050 

250 

1 

0.8240 

500 

1 

0.7860 

1000 

0 

0.7550 


n 


(/*lH-+ /*,)//! 

1 

1 

1.0000 

2 


1.5000 

3 

1 

1.3333 

4 

0 

1.0000 

5 

-> 

1.2000 

6 

0 

1.0000 

7 

2 

1.1429 

8 

0 

1.0000 

9 


1.1111 

10 

0 

1.0000 


Exercise Set 10.11 

1. A plate in the form of a circular disk has boundary temperatures of 0° on the left of its circumference and ] 0 on 
the right half of its circumference. A net with four interior mesh points is overlaid on the disk (see Figure 
Ex-1). 

(a) Using the discrete mean-value property, write the 4 x 4 linear system t = Mt + b that determines the 
approximate temperatures at the four interior mesh points. 

(b) Solve the linear system in part (a). 

(c) Use the Jacobi iteration scheme with t { P) — 0 to generate the iterates t 1 ^, , t^, and t 1 - 1 for the 

linear system in part (a). What is the “error vector” ^ where t is the solution found in part (b)? 










(d) By certain advanced methods, it can be determined that the exact temperatures to four decimal places at the 
four mesh points are ^ = ^3 = . 2871 and ^2 = ^4 = .7129. What are the percentage errors in the values 
found in part (b)? 
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(d) for 1 1 and £ 3 , —12.9%; for £2 and t 4 , 5.2% 

2. Use Theorem 10.11.1 to find the exact equilibrium temperature at the center of the disk in Exercise 1. 


Answer: 
























1 

2 

3. Calculate the first two iterates jC 1 ) and for case ( b ) of Figure 10.11.3 with nine interior mesh points 

[Equation 2] when the initial iterate is chosen as 

t®=[l 1 1 1 1 1 1 1 l] r 


Answer: 


r(l) 


.( 2 ) 
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16 

16 

16 

16 
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4. The random walk illustrated in Figure Ex-4 a can be described by six arrows 

«- 1 — — T — 

that specify the directions of departure from the successive mesh points along the path. Figure Ex-4 b is an array 

of 100 computer-generated, randomly oriented arrows arranged in a x 10 array. Use these arrows to 

determine random walks to approximate the temperature tj, as in Table 2. Proceed as follows: 

1. Take the last two digits of your telephone number. Use the last digit to specify a row and the other to specify 
a column. 

2. Go to the arrow in the array with that row and column number. 

3. Using this arrow as a starting point, move through the array of arrows as you would read a book (left to right 
and top to bottom). Beginning at the point labeled in Figure Ex-4 a and using this sequence of arrows to 
specify a sequence of directions, move from mesh point to mesh point until you reach a boundary mesh 
point. This completes your first random walk. Record the temperature at the boundary mesh point. (If you 
reach the end of the arrow array, continue with the arrow in the upper left comer.) 

4. Return to the interior mesh point labeled and begin where you left off in the arrow array; generate your 

next random walk. Repeat this process until you have completed 10 random walks and have recorded 10 
boundary temperatures. 

5. Calculate the average of the 10 boundary temperatures recorded. (The exact value is = .7491.) 
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Figure Ex-4 






Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 


Tl. Suppose that we have the square region described by 

R= {(x, >>) |0 < x <1,0 <.y < 1} 

and suppose that the equilibrium temperature distribution u(x, y) along the boundary is given by «(x, 0) = Tp, 
u(x, 1) = Ty, u(0, y) = 7’r, an d u (1, y ) = Tp : Suppose next that this region is partitioned into an 
(« + 1) x (« + 1) mesh using 

x, = — and v, = — 

‘ft •'j n 

for i = 0, 1,2,.... ft and j = 0, 1, 2 ,..., ft. If the temperatures of the interior mesh points are labeled by 

Ujj = u(xj, yj ) =u(i I ft, j I ft) 

then show that 


u i,j = + u i+l,/ + u i,j -1 + u i,J+ 1) 

for i = 1, 2, 3 ,— 1 and j = 1, 2, 3. ft — 1. To handle the boundary points, define 

«0 ,j = t L, Un,j = TR, u u o = T b , and u it » = T T 

fori = 1, 2, 3,..., ft — 1 and J = 1,2, 3,.... ft — 1. Next let 

"0 L 


■^H + l — 


1 0 


be the (« + l)x(» + l) matrix with the nxtt identity matrix in the upper right-hand comer, a one in the lower 
left-hand comer, and zeros everywhere else. For example, 




o 

o 

0 1 

1 0 

^3 = 

0 0 1 



1 

o 

o 


F 4 = 


0 10 0 
0 0 10 
0 0 0 1 
10 0 0 



1 

0 

0 

0 

0 


0 0 
1 0 
0 1 
0 0 
0 0 


0 

0 

0 

1 

0 


and so on. By defining the (« + l)x(« + l) matrix 


M n +1 =F n +1 = 


0 /„ 
1 0 


+ 


0 

1 


I n 

0 


7 


show that if C/ M -|_ j is the (« + 1) x (« + 1) matrix with entries u ij. then the set of equations 



( u >—1J + u i+l,j + u i,j -1 + u iJ+l) 
















for i = 1, 2, 3,— 1 and j = 1, 2, 3,— 1 can be written as the matrix equation 

£^w+l = ^(^M + l^M + 1 + ^W+l^W+l) 

where we consider only those elements of with i = 1, 2, 3,— 1 and y = 1, 2, 3,— 1. 

T2. The results of the preceding exercise and the discussion in the text suggest the following algorithm for solving 
for the equilibrium temperature in the square region 

R= {(x,y)\Q<x<l,Q<y<\) 

given the boundary conditions 

u(x, 0) = Tb, u(x, 1) = 7Y, 
u(0,y) = T L , u(\,y) = T R 


1. Choose a value for n , and then choose an initial guess, say 


U® - 
u *+l “ 


0 

t l ... 

T l 

0 

T b 

0 ... 

0 

T r 

T b 

0 ... 

0 

Tr 

0 

Tr ... 

Tr 

0 

r (*+l) 

H + l 

using 




(k) t XK> 

^n -hi ^(^w+I^'m+ 1 * l^w+l) 

where is as defined in Exercise T1 . Then adjust ^ 1 by replacing all edge entries by the initial edge 

entries in jy® ^. [Note: The edge entries of a matrix are the entries in the first and last columns and first and 
last rows.] 

3. Continue this process until — U^_ j is approximately the zero matrix. This suggests that 

u»+i = > 

ft— hx> 


Use a computer and this algorithm to solve for u {x, y) given that 

wO,0)=0, u(x, 1) = 0, u( 0 , 7 ) = 0, u(\,y)=2 

Choose fj = 6 and compute up to . The exact solution can be expressed as 

/ v 8 5 ^ sinh[(2^~ l)-x]sin[(2w - l)sy] 
y,y) (2w-l)sinh[(2w-l)ir] 

Use a computer to compute u{i / 6, j / 6) for i, j = 0, 1, 2, 3, 4, 5, 6, and then compare your results to the values 
of u(i/6,j/6) in 

T3. Using the exact solution u(x, y ) for the temperature distribution described in Exercise T2 , use a graphing 
program to do the following: 

(a) Plot the surface z = u(x, y ) in three-dimensional xyz-space in which z is the temperature at the point (*, y) in 
the square region. 

(b) Plot several isotherms of the temperature distribution (curves in the xy- plane over which the temperature is a 
constant). 

(c) Plot several curves of the temperature as a function of x with y held constant. 



(d) Plot several curves of the temperature as a function of y with v held constant. 
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10.12 Computed Tomography 

In this section we will see how constructing a cross-sectional view of a human body by analyzing X-ray scans leads to an inconsistent linear 
system. We present an iteration technique that provides an “approximate solution” of the linear system. 


Prerequisites 

Linear Systems 
Natural Logarithms 
Euclidean Space R n 


The basic problem of computed tomography is to construct an image of a cross section of the human body using data collected from many 
individual beams of X rays that are passed through the cross section. These data are processed by a computer, and the computed cross section is 
displayed on a video monitor. Figure 10.12.1 is a diagram of General Electric's CT system showing a patient prepared to have a cross section of 
his head scanned by X-ray beams. 



Figure 10.12.1 

Such a system is also known as a CAT scanner , for Computer-^ided Tomography scanner. Figure 10.12.2 shows a typical cross section of a 
human head produced by the system. 



Figure 10.12.2 

The first commercial system of computed tomography for medical use was developed in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 
1979, Houndsfield and A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As we will see in this section, the 
construction of a cross section, or tomograph, requires the solution of a large linear system of equations. Certain algorithms, called algebraic 
reconstruction techniques (ARTs), can be used to solve these linear systems, whose solutions yield the cross sections in digital form. 


Scanning Modes 

Unlike conventional X-ray pictures that are formed by X rays that are projected perpendicular to the plane of the picture, tomographs are 
constructed from thousands of individual, hairline-thin X-ray beams that lie in the plane of the cross section. After they pass through the cross 
section, the intensities of the X-ray beams are measured by an X-ray detector, and these measurements are relayed to a computer where they are 








processed. Figures 10.12.3 and 10.12.4 illustrate two possible modes of scanning the cross section: the parallel mode and the fan-beam mode. 

In the parallel mode a single X-ray source and X-ray detector pair are translated across the field of view containing the cross section, and many 
measurements of the parallel beams are recorded. Then the source and detector pair are rotated through a small angle, and another set of 
measurements is taken. This is repeated until the desired number of beam measurements is completed. For example, in the original 1971 
machine, 160 parallel measurements were taken through 180 angles spaced 1° apart: a total of 160 x 180 = 28, 800 beam measurements. Each 



X-ray 

detector 



source 


Figure 10.12.3 



In the fan-beam mode of scanning, a single X-ray tube generates a fan of collimated beams whose intensities are measured simultaneously by 
an array of detectors on the other side of the field of view. The X-ray tube and detector array are rotated through many angles, and a set of 
measurements is taken at each angle until the scan is completed. In the General Electric CT system, which uses the fan-beam mode, each scan 
takes 1 second. 


Derivation of Equations 

To see how the cross section is reconstructed from the many individual beam measurements, refer to Figure 10.12.5. Here the field of view in 
which the cross section is situated has been divided into many square pixels (picture elements) numbered 1 through N as indicated. It is our 
desire to determine the X-ray density of each pixel. In the EMI system, 6400 pixels were used, arranged in a square 80x80 array. The G.E. CT 
system uses 262,144 pixels ina512x512 array, each pixel being about 1 mm on a side. After the densities of the pixels are determined by the 
method we will describe, they are reproduced on a video monitor, with each pixel shaded a level of gray proportional to its X-ray density. 
Because different tissues within the human body have different X-ray densities, the video display clearly distinguishes the various tissues and 
organs within the cross section. 



Figure 10.12.6 shows a single pixel with an X-ray beam of roughly the same width as the pixel passing squarely through it. The photons 
constituting the X-ray beam are absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the tissue. Quantitatively, 
the X-ray density of the yth pixel is denoted by and is defined by 

number of photons entering the jth pixel 
number of photons leaving the jth pixel 



where “In” denotes the natural logarithmic function. Using the logarithm property ln(a / b) = — ln(& / a), we also have 

/ fraction of photons that pass through 
J l the jth pixel without being absorbed 



Photons entering 
/’th pixel 


Photons leaving 
/th pixel 


Figure 10.12.6 

If the X-ray beam passes through an entire row of pixels (Figure 10.12.7), then the number of photons leaving one pixel is equal to the number 
of photons entering the next pixel in the row. If the pixels are numbered 1, 2, then the additive property of the logarithmic function gives 


*1 +*2 + 


/ number of photons entering the first pixel \ 
\ number of photons leaving the nth pixel J 


= — In 


^fraction of photons that pass ^ 
through the row of n pixels 
without being absorbed 


( 1 ) 


/ 


Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual pixel densities. 



Figure 10.12.7 


Next, consider the X-ray beam in Figure 10.12.5. By the beam density of the /th beam of a scan, denoted by bj, we mean 
























{ number of-photons of the ith beam entering the detector 1 
without the cross section in the field of view 
number of photons of the ith beam entering the detector 
i with the cross section in the field of view i 


= -In 


^fraction of photons of the ith beam that ^ 
pass through the cross section without 
being absorbed I 


( 2 ) 


The numerator in the first expression for b 2 - is obtained by performing a calibration scan without the cross section in the field of view. The 
resulting detector measurements are stored within the computer's memory. Then a clinical scan is performed with the cross section in the field 
of view, the b {s of all the beams constituting the scan are computed, and the values are stored for further processing. 


For each beam that passes squarely through a row of pixels, we must have 

/ fraction of photons of the \ 
beam that pass through the 
row of pixels without being 
absorbed / 

^ J \ i 

Thus, if the z'th beam passes squarely through a row of n pixels, then it follows from Equations 1 and 2 that 

In this equation, bj is known from the clinical and calibration measurements, and xj, *2, ---» are unknown pixel densities that must be 
determined. 


j fraction of photons of the \ 
beam that pass through the 
cross section without being 
absorbed 


More generally, if the z'th beam passes squarely through a row (or column) of pixels with numbers j\, j2 . ji, then we have 

*J l +*i2 + ~ + *h = b t 


If we set 


then we can write this equation as 



£ j = ju J2 . Ji 

otherwise 


a i\ x 1 + a i2 x 2 + - + *iN x N = *>i 


(3) 


We will refer to Equation 3 as the ith beam equation. 

Referring to Figure 10.12.5, however, we see that the beams of a scan do not necessarily pass through a row or column of pixels squarely. 
Instead, a typical beam passes diagonally through each pixel in its path. There are many ways to take this into account. In Figure 10.12.8 we 
outline three methods of defining the quantities a ij that appear in Equation 3, each of which reduces to our previous definition when the beam 
passes squarely through a row or column of pixels. Reading down the figure, each method is more exact than its predecessor, but with 
successively more computational difficulty. 




Using any one of the three methods to define the * 2 / s in the zth beam equation, we can write the set of M beam equations in a complete scan as 


*11* 1 

4= 

*12*2 

+...+ 

aiWXW 

= A 

*21*1 

4= 

*22*2 

+... 4= 

<*2 N X N 

= h 

*M1*1 

4= 

*M2*2 

+...+ 

a MN x N 

= 


In this way we have a linear system of M equations (the M beam equations) in N unknowns (the N pixel densities). 


Depending on the number of beams and pixels used, we can have M M = or M < N- We will consider only the case M > N, the 
so-called overdetermined case , in which there are more beams in the scan than pixels in the field of view. Because of inherent modeling and 
experimental errors in the problem, we should not expect our linear system to have an exact mathematical solution for the pixel densities. In the 
next section we attempt to find an “approximate” solution to this linear system. 


Algebraic Reconstruction Techniques 

There have been many mathematical algorithms devised to treat the overdetermined linear system 4. The one we will describe belongs to the 
class of so-called Algebraic Reconstruction Techniques (ARTs). This method, which can be traced to an iterative technique originally 
introduced by S. Kaczmarz in 1937, was the one used in the first commercial machine. To introduce this technique, consider the following 
system of three equations in two unknowns: 


A: 

*1 

+ *2 = 

2 


£ 2 : 

*1 

- 2X2 = 

-2 

(5) 

A: 

3*i 

“ *2 = 

3 



The lines L\, Z*2> L 3 determined by these three equations are plotted in the *1*2 -plane. As shown in Figure 10.12.9a, the three lines do not have 
a common intersection, and so the three equations do not have an exact solution. However, the points (*i, * 2 ) on the shaded triangle formed by 
the three lines are all situated “near” these three lines and can be thought of as constituting “approximate” solutions to our system. The 
following iterative procedure describes a geometric construction for generating points on the boundary of that triangular region (Figure 
10.12.9Zz): 

Algorithm 1 

Step 0 Choose an arbitrary starting point xq in the x i*2-plane. 

Step 1 Project xq orthogonally onto the first line L\ and call the projection x ^. The superscript 1 indicates that this is the first of several 
cycles through the steps. 

Step 2 Project orthogonally onto the second line £2 and call the projection x ^. 






























Step 3 Project x ® orthogonally onto the third line £3 and call the projection x ®. 

Step 4 Take x ® as the new value of xq and cycle through Steps 1 through 3 again. In the second cycle, label the projected points x ®, x ®, 
x ®; in the third cycle, label the projected points x ®, x ®, x ®; and so forth. 

This algorithm generates three sequences of points 


t. x ® x © x ® 
**1- x, , x, ,x, , 


£ 2 : 

£ 3 : 


x (1) x® x® 
x 2 - x 2 ’ x 2 > 

(1) ® (3) 

X 3 ,X 3 ,X 3 , 


that lie on the three lines £j, £ 2 ? and £ 3 , respectively. It can be shown that as long as the three lines are not all parallel, then the first sequence 
converges to a point X j on L\, the second sequence converges to a point x -, on £ 2 , and the third sequence converges to a point x 3 on £3 (Figure 
10.12.9c). These three limit points form what is called the limit cycle of the iterative process. It can be shown that the limit cycle is independent 
of the starting point xq. 


,x 2 

3.*! -*2=3 

In 

* 1 +^ = 2 


L 5 L { 

(a) 




(f) 

Figure 10.12.9 


Next we discuss the specific formulas needed to effect the orthogonal projections in Algorithm 1. First, because the equation of a line in x\X2 








-space is 


a\x\ + a 2 X 2 = b 


we can express it in vector form as 

where 


a J x = b 



The following theorem gives the necessary projection formula (Exercise 5). 



Orthogonal Projection Formula 


Let L be a line in with equation a J x = b, and let x + be any point in (Figure 10.12.10). Then the orthogonal projection, x^, , of 
x* onto L is given by 


Xn=X + 


(b — a^x 

a r a 



Figure 10.12.10 


EXAMPLE 1 Using Algorithm 1 


We can use Algorithm 1 to find an approximate solution of the linear system given in 5 and illustrated in Figure 10.12.9. If we 
write the equations of the three lines as 

T 

L\. a { x = b\ 

L 2 \ ajx = &2 

T 

L 3 \ a3X = &3 

where 


r* 1 ] 

V 


f 


3" 

14 

_i_ 

> a 2 = 

2_ 

> a 3 = 



bi= 2, b 2 = -2, b 3 = 3 


then, using Theorem 10.12.1, we can express the iteration scheme in Algorithm 1 as 

t(p) 


(p) 


_ X ^-l + J a *’ 


k= 1,2,3 


where p = 1 for the first cycle of iterates, p — 2 for the second cycle of iterates, and so forth. After each cycle of iterates (i.e., 
after 1 is computed), the next cycle of iterates is begun with ^ ^ set equal to x ^. 


Table 1 gives the numerical results of six cycles of iterations starting with the initial point xq = (1, 3). 


Table 1 













*1 

X 2 

x o 

1.00000 

3.00000 

*1" 

.00000 

2.00000 

x(" 

.40000 

1.20000 


1.30000 

.90000 


1.20000 

.80000 

%«> 

.88000 

1.44000 

1? 

1.42000 

1.26000 

*(}) 

*1 

1.08000 

.92000 

xi^' 

.83200 

1.41600 


1.40800 

1.22400 

* ( , 4) 

1.09200 

.90800 

x!, 4) 

.83680 

1.41840 

v (4) 

1.40920 

1.22760 


1.09080 

.90920 

xf 

.83632 

1.41816 

«<5) 

*3 

1.40908 

1.22724 


1.09092 

.90908 

x^ 6) 

.83637 

1.41818 

*<6) 

1.40909 

1.22728 


Using certain techniques that are impractical for large linear systems, we can show the exact values of the points of the limit cycle 
in this example to be 

x* = Q|-, = (1.09090.90909...) 

X 2 = ( 55 ' 55 ) = ( 83636 --- 1 41818...) 
x 3 * = J|t-. g-j = (1.40909.., 1.22727...) 

It can be seen that the sixth cycle of iterates provides an excellent approximation to the limit cycle. Any one of the three iterates 
x ®, X ;' J , x ® can be used as an approximate solution of the linear system. (The large discrepancies in the values of x ® x ® and 

1 _ * -> 1 z 

x ^ are due to the artificial nature of this illustrative example. In practical problems, these discrepancies would be much smaller. 


To generalize Algorithm 1 so that it applies to an overdetermined system of M equations in N unknowns, 


*11*1 

+ 

*12*2 

+... 4= 


= b\ 

*21*1 

+ 

*22*2 

+...+ 

<*2 N X N 

= h 

*M1*1 

4= 

*M2*2 

+... + 

a MN x N 

ll 


we introduce column vectors x and a 2 as follows: 



"*1 " 


"*2l " 


*2 


*i2 

X = 

x N 

a 2 = 

*1 N 


( 6 ) 


With these vectors, the M equations constituting our linear system 6 can be written in vector form as 

a[x = bj, i = 

Each of these M equations defines what is called a hyperplane in the TV-dimensional Euclidean space . In general these M hyperplanes have 
no common intersection, and so we seek instead some point in that is reasonably “close” to all of them. Such a point will constitute an 
approximate solution of the linear system, and its N entries will determine approximate pixel densities with which to form the desired cross 
section. 






















As in the two-dimensional case, we will introduce an iterative process that generates cycles of successive orthogonal projections onto the M 
hyperplanes beginning with some arbitrary initial point in . Our notation for these successive iterates is 


,00. 

x Jc ' 

The algorithm is as follows: 

Algorithm 2 

Step 0 Choose any point in and label it xq. 
Step 1 For the first cycle of iterates, set p = 1. 
Step 2 For k = 1, 2, M, compute 


( the iterate lying on the kth hyperplane \ 
generated during the pth cycle of iterations I 


x k - x k-l 


T 

a k a * 


*k 


Step 3 Set & +r >-JP). 

Step 4 Increase the cycle number p by 1 and return to Step 2. 

In Step 2 the iterate x jW is called the orthogonal projection of onto the hyperplane a £x = Consequently, as in the two-dimensional 

case, this algorithm determines a sequence of orthogonal projections from one hyperplane onto the next in which we cycle back to the first 
hyperplane after each projection onto the last hyperplane. 

It can be shown that if the vectors , a2 ,..a m span then the iterates x ' ,^, x ®, x ® lying on the Mth hyperplane will converge to a 

point x on that hyperplane which does not depend on the choice of the initial point xq. In computed tomography, one of the iterates x ^ for p 
sufficiently large is taken as an approximate solution of the linear system for the pixel densities. 


Note that for the center-of-pixel method, the scalar quantity ajj. appearing in the equation in Step 2 of the algorithm is simply the number of 
pixels in which the kth beam passes through the center. Similarly, note that the scalar quantity 

, T (P) 

1 

in that same equation can be interpreted as the excess kth beam density that results if the pixel densities are set equal to the entries of • This 
provides the following interpretation of our ART iteration scheme for the center-of-pixel method: Generate the pixel densities of each iterate by 
distributing the excess beam density of successive beams in the scan evenly among those pixels in which the beam passes through the center 
When the last beam in the scan has been reached, return to the first beam and continue. 

EXAMPLE 2 Using Algorithm 2 

We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the 3 x 3 array illustrated in Figure 
10.12.11. These 9 pixels are scanned using the parallel mode with 12 beams whose measured beam densities are indicated in the 
figure. We choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 8, you are asked to set up the 
beam equations using the center line and area methods.) As you can verify, the beam equations are 


*7 

4= 

*8 

4= 

x 9 

= 13.00 

*3 

4= 

x 6 

+ 

*9 

= 18.00 

x 4 

4= 


4= 

x 6 

= 15.00 

x 2 

4= 

x 5 

4= 

*8 

= 12.00 

*1 

4= 

*2 

4= 

*3 

= 8.00 

x \ 

4= 

x 4 

+ 

X 1 

= 6.00 

x 6 

4= 

*8 

4= 

*9 

= 14.79 

x 2 

+ 

x 3 

4= 

x 6 

= 10.51 

x 3 

4= 

x 5 

4= 

x 7 

= 14.31 

x \ 

4= 

x 5 

4= 

x 9 

= 16.13 

*1 

4= 

x 2 

4= 

x 4 

= 3.81 

x 4 

+ 

X 1 

4= 

*8 

= 7.04 


Table 2 illustrates the results of the iteration scheme starting with an initial xq = 0. The table gives the values of each of the first 

cycle of iterates, x - 11 through x ®, but thereafter gives the iterates -f ff only for various values of p. The iterates x ^ start 

1 1a la la 

repeating to two decimal places for p > 45, and so we take the entries of x ^ as approximate values of the 9 pixel densities. 




Figure 10.12.11 


Table 2 



Pixel Densities 



X 2 

*3 

*4 


*6 

X 7 

*8 

X 9 

x o 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.(1) 

A l 

.00 

.00 

.00 

.00 

.00 

.00 

4.33 

4.33 

4.33 

X <1> 

.00 

.00 

.00 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

-.0) 

A J 

2.67 

2.67 

2.67 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

C 

2.67 

2.67 

2.67 

5.00 

5.00 

5.37 

4.33 

4.71 

4.71 


2.67 

2.67 

3.44 

5.00 

5.77 

5.37 

5.10 

4.71 

4.71 

x; 1 ' 

.49 

.49 

3.44 

2.83 

5.77 

5.37 

5.10 

4.71 

4.71 


.49 

.49 

4.93 

2.83 

5.77 

6.87 

5.10 

4.71 

6.20 

v o) 

A 8 

.49 

.84 

4.93 

2.83 

6.11 

6.87 

5.10 

5.05 

6.20 

~(l) 

-.31 

.84 

4.93 

2.02 

6.11 

6.87 

4.30 

5.05 

6.20 

v (l) 

A I0 

-.31 

.13 

4.22 

2.02 

6.11 

6.16 

4.30 

5.05 

6.20 

wU ) 

A ll 

1.06 

.13 

4.22 

2.02 

7.49 

6.16 

4.30 

5.05 

7.58 

X<‘> 

a I2 

1.06 

.13 

4.22 

.58 

7.49 

6.16 

2.85 

3.61 

7.58 

v (2) 

A 12 

2.03 

.69 

4.42 

1.34 

7.49 

5.39 

2.65 

3.04 

6.61 

v (3) 

a I2 

1.78 

.51 

4.52 

1.26 

7.49 

5.48 

2.56 

3.22 

6.86 

v (4) 

*12 

1.82 

.52 

4.62 

1.37 

7.49 

5.37 

2.45 

^ I'y 

6.82 

«(5) 

*12 

1.79 

.49 

4.71 

1.43 

7.49 

5.31 

2.37 

3.25 

6.85 

V (I0> 

a I2 

1.68 

.44 

5.03 

1.70 

7.49 

5.03 

2.04 

3.29 

6.96 

v t20) 
A, 2 

1.49 

.48 

5.29 

2.00 

7.49 

4.73 

1.79 

3.25 

7.15 

v <») 

X 12 

1.38 

.55 

5.34 

2.11 

7.49 

4.62 

1.74 

3.19 

7.26 

„(40) 

1.33 

.59 

5.33 

2.14 

7.49 

4.59 

1.75 

3.15 

7.31 

v «45> 

A 12 

1.32 

.60 

5.32 

2.15 

7.49 

4.59 

1.76 

3.14 

7.32 


We close this section by noting that the field of computed tomography is presently a very active research area. In fact, the ART scheme 
discussed here has been replaced in commercial systems by more sophisticated techniques that are faster and provide a more accurate view of 
the cross section. However, all the new techniques address the same basic mathematical problem: finding a good approximate solution of a 
large overdetermined inconsistent linear system of equations. 


Exercise Set 10.12 


i. 


(a) Setting _ (x^y x< ^)’ s ^ ow that the three projection equations 




_ + ( b k ~ 4 *^- 1 ) 

T a *’ 

a * a * 


k = 1 , 2,3 




























































for the three lines in Equation 5 can be written as 


»_ 1 


» Cp), 


k= 1 : 


* = 2 : 


* = 3: 


Ml “ "2^ + *01 ” *02 J 
xg ) -l[-2 + 4r« + 2.«] 


» _ 1 


»! 


*22 =t [4 + 2x U + *W ] 


*• 

» 


(?) 


(P) 


31 ~ -j-Q- [9 -4* x 2i 


3x 


CP)i 

22 J 


'32 ’ ~ 'jq' t “ 3 + ^ x< 2\ 


*4?) 


where («of +1) . 4r u ) - <4?. 4?') f™p=i. 2 . 

(b) Show that the three pairs of equations in part (a) can be combined to produce 


» 1 


'3! — 20’[ 2 ^ + x 31 


(p- 0 (p-l) n 


'32 


» 1 


(p -0 ,.0-0, 


/> = 1 , 2 ,. 


'32 =2^[24 + 3x 3 ^ — 3*32 v ] 

. [Note: Using this pair of equations, we can perform one complete cycle of three orthogonal 

projections in a single step.] 

(c) Because x ':f tends to the limit point as p—> oo ? the equations in part (b) become 

4 =^[ 28+4 - 42 ] 

*32 = 2q” 124 + 3* 31 - 3x 32 ] 

as P — 1 ► oo. Solve this linear system for x^ = , X 32 ) • [Note: The simplifications of the ART formulas described in this exercise are 

impractical for the large linear systems that arise in realistic computed tomography problems.] 


Answer: 



2. Use the result of Exercise 1(b) to find x ^ x ® x 'y" to five decimal places in Example 1 using the following initial points: 

(a) *0 = ( 0 . 0 ) 

(b) *0 = O> 1) 

(c) xo = (148. -15) 


Answer: 


(a) x f = (1.40000, 1.20000) 
x® = (1.41000, 1.23000) 
xf = (1.40900, 1.22700) 
x® = (1 40910, 1.22730) 
x® = (1.40909, 1.22727) 

xf = (1.40909,1.22727) 

(b) Same as part (a) 



(°) xf = (9 55000, 25 65000) 
x® = (.59500, - 1.21500) 
xf = (1.49050, 1.47150) 
xf = (1.40095, 1.20285) 
xf* = (1.40991, 1.22972) 
xf = (1.40901, 1.22703) 


3 . 


(a) Show directly that the points of the limit cycle in Example 1, 


v * _ (H 11) * - 78) * _ /31_ 27 ) 

1 Ul’ll) 2 p5’ 55) 3 ^22’ 22J 

form a triangle whose vertices lie on the lines L\, £ 2 , and £3 and whose sides are perpendicular to these lines (Figure 10.12.9c). 

/t_x ( 1 ) * i3\ 21 \ 

Using the equations derived in Exercise 1(a), show that if Xq = X 3 = f—, 7^7 |, then 

*02 10 ) 

X 1 ~ X 1 - (ll ’ ll ) 

J?)_ *_M6 78) 

x 2 ~ x 2 “ (55 ’ 55 ) 

,©_• /3! 27) 

x 3 — x 3 — ^22 ’ 22 J 

[Note: Either part of this exercise shows that successive orthogonal projections of any point on the limit cycle will move around the 
limit cycle indefinitely.] 


4 . The following three lines in the ^ 1 ^ 2 -plane, 


L\\ * 2=1 

Lx *1 -*2 = 2 
£3: *1 — *2 = 0 


do not have a common intersection. Draw an accurate sketch of the three lines and graphically perform several cycles of the orthogonal 
projections described in Algorithm 1, beginning with the initial point xq = (0, 0). On the basis of your sketch, determine the three points of 
the limit cycle. 


Answer: 

X 1 = ( 1 . 1 )’ x 2 = ( 2 > °)’ x 3 = 0 - 0 

5. Prove Theorem 10.12.1 by verifying that 

(a) the point x p as defined in the theorem lies on the line a T x _ ^ (i.e., = £)• 

(b) the vector x v — x + is orthogonal to the line a T x _ ^ (i.e., x v — x + is parallel to a). 

6 . As stated in the text, the iterates x ® ? x ® defined in Algorithm 2 will converge to a unique limit point if the vectors 

aj, a 2 ,.. S-M s P an • Show that if this is the case and if the center-of-pixel method is used, then the center of each of the N pixels in the 
field of view is crossed by at least one of the M beams in the scan. 

7. Construct the 12 beam equations in Example 2 using the center line method. Assume that the distance between the center lines of adjacent 
beams is equal to the width of a single pixel. 


Answer: 




x 7 + xg + X 9 = 13.00 
X 4 + X 5 + ^6 = 15.00 
xj 4 =x 2 4 -x 3 = 8.00 
.82843(x 6 + xg) + .58579^9 = 14.79 
1.41421 (x 3 + x 5 + x 7 ) = 14.31 
.82843 (x 2 4= x 4 ) 4= .58579xi = 3.81 
x 3 + x$ +X 9 = 18.00 
x 2 + X 5 + xg = 12.00 
xi 4=x 4 + x 7 = 6.00 
. 82843 (x 2 + x 6 ) + .58579x3 = 10.51 
1.41421 (xi+x 5 4-x 9 ) = 16.13 
82843(x 4 + xg) + .58579x 7 = 7.04 

8 . Construct the 12 beam equations in Example 2 using the area method. Assume that the width of each beam is equal to the width of a single 
pixel and that the distance between the center lines of adjacent beams is also equal to the width of a single pixel. 

Answer: 

x 7 + xg + X9 = 13.00 
X44-X5 + xg = 15.00 
xi +x 2 4^x3 = 8.00 
04289(x 3 4- x 5 + x 7 ) + .75000(x 6 4- xg) + ,61396x 9 = 14.79 
.91421 (x 3 + X5 + x 7 ) 4 = .25000(x 2 4-X4 + X6 + xg) = 14.31 
. 04289(x3 + X5 + x 7 ) + .75000(x 2 4=x 4 ) 4= ,61396xi = 3.81 

x 3 + xg + X9 = 18.00 
x 2 + X5 + xg = 12.00 
xi 4=X4 + x 7 = 6.00 

04289(xi +X5 + X9) 4= .75000(x 2 + xg) 4= .61396x 3 = 10.51 
.91421 (xi +X5 + X9) 4 = .25000(x 2 + X4 + X6 4=xg) = 16.13 
,04289(xi +X5 4^x9) + .75000(x 4 4=xg) 4= .61396x 7 = 7.04 

Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematical Maple, Derive, or 
Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each 
exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your 
technology utility to solve many of the problems in the regular exercise sets. 


Tl. Given the set of equations 

^kX+b k y = c k 

for k = 1, 2, 3, n (with ^ > 2)> letus consider the following algorithm for obtaining an approximate solution to the system. 
1. Solve all possible pairs of equations 

fljX 4 -bjy = Cj and ajx + bjy = cj 

for i, j = 1, 2, 3, n and i < j for their unique solutions. This leads to 

1 ) 

solutions, which we label as 


for i, j = 1, 2, 3,.. n and i < j. 

2. Construct the geometric center of these points defined by 


(*c.yc) = 


. M-l M 

, - n -E E , n E E y i} 

»(«-!) i=l>=i+l 3 »(»-!) i=l;a+l 3 


n —1 n 


and use this as the approximate solution to the original system. 




Use this algorithm to approximate the solution to the system 

x + y = 2 
x -2y= -2 
3 x- y = 3 

and compare your results to those in this section. 

T 2 . (Calculus required) Given the set of equations 

<*kX+biO’=c k 

for k = 1, 2, 3, n (with n > 2), let us consider the following least squares algorithm for obtaining an approximate solution ( x ,y ) to the 
system. Given a point (c*, ff) and the line a^x -| -b^y =Cp the distance from this point to the line is given by 

|a i q + & ii a-c,| 


If we define a function f(x,y) by 


f(x,y) = t tW + bV-'i) 
1=1 


2 . ,2 

a i +b i 

and then determine the point (* , y ) that minimizes this function, we will determine the point that is closest to each of these lines in a 
summed least squares sense. Show that x and y are solutions to the system 

n „ 


and 


Apply this algorithm to the system 


£__ 

'=> «, 2 +*, 2 




V'_ a i b i 


+ (E—^ 


= E 




x+ y = 2 

x - 2y= - 2 
3x - y = 3 


i=laf + bf 


‘ _ 2>,C, 


i=laf + bj 


and compare your results to those in this section. 
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10.13 Fractals 

In this section we will use certain classes of linear transformations to describe and generate intricate sets in the Euclidean plane. These sets, called fractals, are 
currently the focus of much mathematical and scientific research. 


Prerequisites 

Geometry of Linear Operators on (Section 4.11) 

Euclidean Space R n 

Natural Logarithms 

Intuitive Understanding of Limits 


Fractals in the Euclidean Plane 

At the end of the nineteenth century and the beginning of the twentieth century, various bizarre and wild sets of points in the Euclidean plane began appearing in 
mathematics. Although they were initially mathematical curiosities, these sets, called fractals, are rapidly growing in importance. It is now recognized that they reveal 
a regularity in physical and biological phenomena previously dismissed as “random,” “noisy,” or “chaotic.” For example, fractals are all around us in the shapes of 
clouds, mountains, coastlines, trees, and ferns. 

In this section we give a brief description of certain types of fractals in the Euclidean plane Much of this description is an outgrowth of the work of two 
mathematicians, Benoit B. Mandelbrot and Michael Barnsley, who are both active researchers in the field. 


Self-Similar Sets 


To begin our study of fractals, we need to introduce some terminology about sets in g}. We will call a set in p} bounded if it can be enclosed by a suitably large 
circle (Figure 10.13.1) and closed if it contains all of its boundary points (Figure 10.13.2). Two sets in r} will be called congruent if they can be made to coincide 
exactly by translating and rotating them appropriately within r} (Figure 10.13.3). We will also rely on your intuitive concept of overlapping and nonoverlapping 
sets, as illustrated in Figure 10.13.4. 


y Enclosing 

circle 


Unbounded set 


(a) Set enclosed by a circle ( b ) This set cannot be 

enclosed by any circle. 


Figure 10.13.1 


y 


Closed set 


The boundary points (solid color) lie in the set. 


Congruent sets 











Figure 10.13.3 


x 

(a) Overlapping sets 

ti y 


( b ) Nonoverlapping sets 

Figure 10.13.4 

If T-E? —» F? is the linear operator that scales by a factor of s (see Table 7 of Section 4.9), and if Q is a set in g}, then the set T(Q) (the set of images of points in Q 
under T) is called a dilation of the set Q if s > 1 and a contraction of Q if 0 < s < 1 (Figure 10.13.5). In either case we say that T(Q) is the set Q scaled by the factor 
s. 



The types of fractals we will consider first are called self-similar. In general, we define a self-similar set in g 2 as follows: 

r n 


DEFINITION 1 

A closed and bounded subset of the Euclidean plane g^ is said to be self-similar if it can be expressed in the form 

S = S\ U&2 U^3 U...U£ft (1) 

where S\ r S 2 , £3, Sfr are nonoverlapping sets, each of which is congruent to S scaled by the same factor s (0 < s < 1). 

L J 


If S is a self-similar set, then 1 is sometimes called a decomposition of S into nonoverlapping congruent sets. 

EXAMPLE 1 Line Segment 


Aline segment in g 2 (Figure 10.13.6a) can be expressed as the union of two nonoverlapping congruent line segments (Figure 10.13.66). In Figure 
10.13.66 we have separated the two line segments slightly so that they can be seen more easily. Each of these two smaller line segments is congruent to 


the original line segment scaled by a factor of A Hence, a line segment is a self-similar set with k = 2 an d s = 


(*) 


(*) 

Figure 10.13.6 















EXAMPLE 2 Square 


A square (Figure 10.13.7a) can be expressed as the union of four nonoverlapping congruent squares (Figure 10.13.7Z>), where we have again separated 
the smaller squares slightly. Each of the four smaller squares is congruent to the original square scaled by a factor of i Hence, a square is a self-similar 

i 2 

set with k = A and s = . 


(<0 


{b) 

Figure 10.13.7 


EXAMPLE 3 Sierpinski Carpet 


The set suggested by Figure 10.13.8a, the Sierpinski “carpet,” was first described by the Polish mathematician Waclaw Sierpinski (1882-1969). It can 
be expressed as the union of eight nonoverlapping congruent subsets (Figure 10.13.86), each of which is congruent to the original set scaled by a factor 
of i. Hence, it is a self-similar set with k = S and s = y. Note that the intricate square-within-a-square pattern continues forever on a smaller and 
smaller scale (although this can only be suggested in a figure such as the one shown). 


(<*) 


Figure 10.13.8 


(*> 


EXAMPLE 4 Sierpinski Triangle 

Figure 10.13.9a illustrates another set described by Sierpinski. It is a self-similar set with k = 3 and s = (Figure 10.13.96). As with the Sierpinski 
carpet, the intricate triangle-within-a-triangle pattern continues forever on a smaller and smaller scale. 


(«) 


(*> 


Figure 10.13.9 


The Sierpinski carpet and triangle have a more intricate structure than the line segment and the square in that they exhibit a pattern that is repeated indefinitely. This 
difference will be explored later in this section. 


Topological Dimension of a Set 

In Section 4.5 we defined the dimension of a subspace of a vector space to be the number of vectors in a basis, and we found that definition to coincide with our 
intuitive sense of dimension. For example, the origin of p 2 is zero-dimensional, lines through the origin are one-dimensional, and p} itself is two-dimensional. This 
definition of dimension is a special case of a more general concept called topological dimension, which is applicable to sets in R n that are not necessarily subspaces. 
A precise definition of this concept is studied in a branch of mathematics called topology. Although that definition is beyond the scope of this text, we can state 
informally that 

a point in p} has topological dimension zero; 
a curve in p^ has topological dimension one; 
a region in has topological dimension two. 

It can be proved that the topological dimension of a set in P n must be an integer between 0 and n, inclusive. In this text we will denote the topological dimension of a 
set -S' by of T'(^). 

EXAMPLE 5 Topological Dimensions of Sets 

Table 1 gives the topological dimensions of the sets studied in our earlier examples. The first two results in this table are intuitively obvious; however, 
the last two are not. Informally stated, the Sierpinski carpet and triangle both contain so many “holes” that those sets resemble web-like networks of 
lines rather than regions. Hence they have topological dimension one. The proofs are quite difficult. 

Table 1 


SetS 

dj{S) 

Line segment 

1 

Square 

2 

Sierpinski carpet 

1 

Sierpinski triangle 

1 


Hausdorff Dimension of a Self-Similar Set 

In 1919 the German mathematician Felix Hausdorff (1868-1942) gave an alternative definition for the dimension of an arbitrary set in R n . His definition is quite 
complicated, but for a self-similar set, it reduces to something rather simple: 


DEFINITION 1 


The Hausdorff dimension of a self-similar set S of form 1 is denoted by d}{{S) and is defined by 


d H (S) = 


ln£ 

ln(l /s) 


( 2 ) 


J 


In this definition, “In” denotes the natural logarithm function. Equation 2 can also be expressed as 

JifS) _ 1 


(3) 


in which the Hausdorff dimension df{(S) appears as an exponent. Formula 3 is more helpful for interpreting the concept of Hausdorff dimension; it states, for 

1 /1 \djfS) 

example, that if you scale a self-similar set by a factor of s = —, then its area (or more properly its measure) decreases by a factor of 1 2- j . Thus, scaling a line 


segment by a factor of -1 reduces its measure (length) by a factor of 




(2) 2’ 


and scaling a square region by a factor of ~ reduces its measure (area) by a factor of 









Before proceeding to some examples, we should note a few facts about the Hausdorff dimension of a set: 

The topological dimension and Hausdorff dimension of a set need not be the same. 

The Hausdorff dimension of a set need not be an integer. 

The topological dimension of a set is less than or equal to its Hausdorff dimension; that is, d j(ff) S'). 

EXAMPLE 6 Hausdorff Dimensions of Sets 

Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. 

Table 2 


Set S 



In A' 

s 

k 

In (l/j) 

Line segment 


2 

In 2/In 2 = 1 

Square 


4 

In 4/ln 2 = 2 

Sierpinski carpet 


8 

In 8/ln 3 = 1.892 . .. 

Sierpinski triangle 


3 

In 3/ln 2- 1.584 ... 


Fractals 

Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are equal for both the line segment and square but are unequal for the Sierpinski 
carpet and triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological and Hausdorff dimensions differ must be quite complicated (as 
Hausdorff had earlier suggested in 1919). Mandelbrot proposed calling such sets fractals , and he offered the following definition. 

r 1 


DEFINITION 3 

A fractal is a subset of a Euclidean space whose Hausdorff dimension and topological dimension are not equal. 


According to thisdefinition, the Sierpinski carpet and Sierpinski triangle are fractals, whereas the line segment and square are not. 

It follows from the preceding definition that a set whose Hausdorff dimension is not an integer must be a fractal (why?). However, we will see later that the converse 
is not true; that is, it is possible for a fractal to have an integer Hausdorff dimension. 


Similitudes 

We will now show how some techniques from linear algebra can be used to generate fractals. This linear algebra approach also leads to algorithms that can be 
exploited to draw fractals on a computer. We begin with a definition. 


DEFINITION 4 


A similitude with scale factor s is a mapping of g} into g} of the form 


where s, 0, e, and/are scalars. 



cos 9 

—sin 9 

~x~ 

sin0 

cos 9 

y 



J 


Geometrically, a similitude is a composition of three simpler mappings: a scaling by a factor of s, a rotation about the origin through an angle 0, and a translation (e 
units in the x-direction and/units in the y-direction). Figure 10.13.10 illustrates the effect of a similitude on the unit square U. 
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( 0 . 0)1 ( 1 , 0 ) 
(a) Unit square 



(b) Unit square 
after similitude 


Figure 10.13.10 

For our application to fractals, we will need only similitudes that are contractions, by which we mean that the scale factor s is restricted to the range 0 < s < 1 • 
Consequently, when we refer to similitudes we will always mean similitudes subject to this restriction. 

Similitudes are important in the study of fractals because of the following fact: 


Iff ; p} —► p} is a similitude with scale factor s and if S is a closed and bounded set in p^, then the image T(S) of the set S under T is congruent to S scaled 
by s. 


Recall from the definition of a self-similar set in p* that a closed and bounded set S in p 2 is self-similar if it can be expressed in the form 

S=SiUS 2 uS 3 U...uS k 

where S\, S 2 , are nonoverlapping sets each of which is congruent to S scaled by the same factor s (0 < s < 1) [see 1]. In the following examples, we will 

find similitudes that produce the sets S\ t S 2 , S 2 > - - Sfc f rom S for the line segment, square, Sierpinski carpet, and Sierpinski triangle. 

EXAMPLE 7 Line Segment 


We will take as our line segment the line segment S connecting the points (0, 0) and (1, 0) in the xy-plane (Figure 10.13.11a). Consider the two 
similitudes 


II 

"1 O' 
_0 1 

p] 


t 4 

<]) = i 
qj 2 

'1 o' 
_0 1_ 

+ 

i-1 

T 

2 

0 


(4) 


both of which have s = and Q = Q. In Figure 10.13.11Z? we show how these two similitudes map the unit square U. The similitude 7 \ maps U onto 


the smaller square T\ (U), and the similitude T 2 maps U onto the smaller square T 2 (U). At the same time, T\ maps the line segment S onto the 
smaller line segment T \ (£), and T 2 maps S onto the smaller nonoverlapping line segment T 2 (S). The union of these two smaller nonoverlapping line 
segments is precisely the original line segment S; that is, 


s=riGS)u 72(£) 


( 5 ) 
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Figure 10.13.11 


EXAMPLE 8 Square 


Let us consider the unit square U in the xy-plane (Figure 10.13.12a) and the following four similitudes, all having s = and 9 = 0- 


°]H Ti 
-4d)=i[o m 


(p])-i[: :h 

PH: 


The images of the unit square U under these four similitudes are the four squares shown in Figure 10.13.12Z?. Thus, 

U = T\{U) U T 2 (U) U T 3 (U) U T 4 (U) 

is a decomposition of U into four nonoverlapping squares that are congruent to U scaled by the same scale factor ^ ^ j. 


( 6 ) 


(7) 




(0,1) 

(Li) 
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(0,0) 

(1,0) 
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(0. 1) 
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( 0 . 0 ) 


(A.o) 0.0) 
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Figure 10.13.12 


EXAMPLE 9 Sierpinski Carpet 


Let us consider a Sierpinski carpet S over the unit square U of the xy-plane (Figure 10.13.13a) and the following eight similitudes, all having s = — and 
0 = 0 : 


'•■PHJ ?][?]- 


Si 


i = 1.2, 3.8 


where the eight values of 


1 


2 


0 

3 

, 

3 

, 

1 

0 


0 


3 


The images of S under these eight similitudes are the eight sets shown in Figure 10.13.13Z>. Thus, 

S = T \(£) U 7 2 (S) u 73(50 U...U 7 8 (50 

is a decomposition of S into eight nonoverlapping sets that are congruent to S scaled by the same scale factor fs = j j. 



(<*) 


(ft) 


( 8 ) 


(9) 


Figure 10.13.13 


EXAMPLE 10 Sierpinski Triangle 

Let us consider a Sierpinski triangle S fitted inside the unit square U of the xy-plane, as shown in Figure 10.13.14a, and the following three similitudes, 
all having s = and 0 = Q: 


4H) - i[i !H 
4PB - i[i ?] 
4H) - i[i ?] 


The images of S under these three similitudes are the three sets in Figure 10.13.14Z>. Thus, 

S'=7i(50u7 2 (50u7 3 (50 


( 10 ) 


is a decomposition of S into three nonoverlapping sets that are congruent to S scaled by the same scale factor 



(11) 
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In the preceding examples we started with a specific set S and showed that it was self-similar by finding similitudes T\, 7 * 2 , T 3 ,.. T^ with the same scale factor 
such that T\ (£), 72 ( 5 ), 73 ( 5 ),..., 7fc(5) were nonoverlapping sets and such that 

5 = T\ (5) U 7*2(5) U 7 3 (5) U... U 7* (5) 

The following theorem addresses the converse problem of determining a self-similar set from a collection of similitudes. 


THEOREM 10.13.1 

If T\, 7 * 2 , 7 3 , Tfr are contracting similitudes with the same scale factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane 
such that 

5 = T\ (5) U 7-2(5) U 7-3(5) U...U 7*(5) 

Furthermore, if the sets 7i(5), 7 2 (5), 73 ( 5 ),7^(5) are nonoverlapping, then S is self-similar. 


Algorithms for Generating Fractals 


In general, there is no simple way to obtain the set S in the preceding theorem directly. We now describe an iterative procedure that will determine S from the 
similitudes that define it. We first give an example of the procedure and then give an algorithm for the general case. 

EXAMPLE 11 Sierpinski Carpet 


Figure 10.13.15 shows the unit square region 5 q in the xy-plane, which will serve as an “initial” set for an iterative procedure for the construction of the 
Sierpinski carpet. The set S\ in the figure is the result of mapping £q with each of the eight similitudes Tj (i = 1, 2, 8 ) in 8 that determine the 
Sierpinski carpet. It consists of eight square regions, each of side length i, surrounding an empty middle square. Next we apply the eight similitudes to 


and arrive at the set & 2 - Similarly, applying the eight similitudes to £2 results in the set £ 3 . It we continue this process indefinitely, the sequence of 
sets S\, S 2 , £ 3 ,... will “converge” to a set S, which is the Sierpinski carpet. 
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Figure 10.13.15 








Although we should properly give a definition of what it means for a sequence of sets to “converge” to a given set, an intuitive interpretation will suffice in 
this introductory treatment. 


Although we started in Figure 10.13.15 with the unit square region to arrive at the Sierpinski carpet, we could have started with any nonempty set Sq. The only 
restriction is that the set Sq be closed and bounded. For example, if we start with the particular set Sq shown in Figure 10.13.16, then Sj is the set obtained by 
applying each of the eight similitudes in 8 . Applying the eight similitudes to results in the set S 2 • As before, applying the eight similitudes indefinitely yields the 
Sierpinski carpet S as the limiting set. 
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Figure 10.13.16 

The general algorithm illustrated in the preceding example is as follows: Let T\, T 2 , T 3 ,7\ be contracting similitudes with the same scale factor, and for an 
arbitrary set Q in g}, define the set J{Q) by 

J(Q) = 7 * 1(0 u T 2 (Q) u 73(0 u...u 7ft(0 

The following algorithm generates a sequence of sets Sq, S\,S n , ... that converges to the set S in Theorem 10.13.1. 

Algorithm 1 

Step 0 Choose an arbitrary nonempty closed and bounded set Sq in p}. 

Step 1 Computes'! =J(S q). 

Step 2 Compute S 2 = ) • 

Step 3 Compute S 3 = J (S' 2 ). 

Step n Compute S n = J (S^-i). 

EXAMPLE 12 Sierpinski Triangle 

Let us construct the Sierpinski triangle determined by the three similitudes given in 10. The corresponding set mapping is 

J(Q) = T\{Q) U 72(0 U 73 ( 0 . Figure 10.13.17 shows an arbitrary closed and bounded set Sq; the first four iterates S 2 , S 3 , S 4 ; and the limiting 
set S (the Sierpinski triangle). 
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Figure 10.13.17 
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EXAMPLE 13 Using Algorithm 1 


Consider the following two similitudes: 




The actions of these two similitudes on the unit square U are illustrated in Figure 10.13.18. Here, the rotation angle 0 is a parameter that we will vary to 
generate different self-similar sets. The self-similar sets determined by these two similitudes are shown in Figure 10.13.19 for various values of 0. For 
simplicity, we have not drawn the xy-axes, but in each case the origin is the lower left point of the set. These sets were generated on a computer using 
Algorithm 1 for the various values of 0. Because k = 2 and s = -i, it follows from 2 that the Hausdorff dimension of these sets for any value of 0 is 1. It 

can be shown that the topological dimension of these sets is 1 for 9 = Q and 0 for all other values of 9. It follows that the self-similar set for 9 = Q is not 
a fractal [it is the straight line segment from (0, 0) to (.6, .6)], while the self-similar sets for all other values of 9 are fractals. In particular, they are 
examples of fractals with integer Hausdorff dimension. 
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A Monte Carlo Approach 













The set-mapping approach of constructing self-similar sets described in Algorithm 1 is rather time-consuming on a computer because the similitudes involved must be 
applied to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael Barnsley described an alternative, more practical method of 
generating a self-similar set defined through its similitudes. It is a so-called Monte Carlo method that takes advantage of probability theory. Barnsley refers to it as 
the Random Iteration Algorithm. 


Let T\, 7*2, T'l,T k be contracting similitudes with the same scale factor. The following algorithm generates a sequence of points 


[21 


that collectively converge to the set S in Theorem 10.13.1. 

Algorithm 2 

*o" 


Step 0 Choose an arbitrary point 


y 0 


in S. 


Step 1 Choose one of the k similitudes at random, say Tjq, and compute 




Step 2 Choose one of the k similitudes at random, say 7\ 2 , and compute 


[;iK([;:]) 


Step n Choose one of the k similitudes at random, say Tt , and compute 


y* 


*T k 


"U/m-iJJ 


On a computer screen the pixels corresponding to the points generated by this algorithm will fill out the pixel representation of the limiting set S. 

Figure 10.13.20 shows four stages of the Random Iteration Algorithm that generate the Sierpinski carpet, starting with the initial point . 

Although Step 0 in the preceding algorithm requires the selection of an initial point in the set S, which may not be known in advance, this is not a serious 
problem. In practice, one can usually start with any point in r} and after a few iterations (say ten or so), the point generated will be sufficiently close to S that the 
algorithm will work correctly from that point on. 
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Figure 10.13.20 


100,000 iterations 


More General Fractals 

So far, we have discussed fractals that are self-similar sets according to the definition of a self-similar set in ft}. However, Theorem 10.13.1 remains true if the 
similitudes T\, 7*2, - - T k are replaced by more general transformations, called contracting affine transformations. An affine transformation is defined as follows: 

r n 


DEFINITION 5 


An affine transformation is a mapping of pf into p} of the form 


where a, b, c, d, e, and/are scalars. 




Figure 10.13.21 shows how an affine transformation maps the unit square U onto a parallelogram T(U). An affine transformation is said to be contracting if the 
Euclidean distance between any two points in the plane is strictly decreased after the two points are mapped by the transformation. It can be shown that any k 
contracting affine transformations T\, 7*2, 7\ determine a unique closed and bounded set S satisfying the equation 


























S = 7i(S) U 72(50 U 7 3 (50 U...U 7*(S) 


(13) 


Equation 13 has the same form as Equation 12, which we used to find self-similar sets. Although Equation 13, which uses contracting affine transformations, does not 
determine a self-similar set S, the set it does determine has many of the features of self-similar sets. For example, Figure 10.13.22 shows how a set in the plane 
resembling a fern (an example made famous by Barnsley) can be generated through four contracting affine transformations. Note that the middle fern is the slightly 
overlapping union of the four smaller affine-image ferns surrounding it. Note also how 7 3 , because the determinant of its matrix part is zero, maps the entire fern onto 
the small straight line segment between the points (.50, 0) and (.50, .16). Figure 10.13.22 contains a wealth of information and should be studied carefully. 
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Michael Barnsley has applied the above theory to the field of data compression and transmission. The fern, for example, is completely determined by the four affine 
transformations T\, 7 3 , 7 3 , 74 . These four transformations, in turn, are determined by the 24 numbers given in Figure 10.13.22 defining their corresponding values 











































of a, b, c, d, e, and f In other words, these 24 numbers completely encode the picture of the fern. Storing these 24 numbers in a computer requires considerably less 
memory space than storing a pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map on a computer screen can be described 
through a finite number of affine transformations, although it is not easy to determine which transformations to use. Nevertheless, once encoded, the affine 
transformations generally require several orders of magnitude less computer memory than a pixel-by-pixel description of the pixel map. 


Further Readings 

Readers interested in learning more about fractals are referred to the following books, the first of which elaborates on the linear transformation approach of 
this section. 

1. Michael Barnsley, Fractals Everywhere (New York: Academic Press, 1993). 

2. Benoit B. Mandelbrot, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 

3. Heinz-Otto Peitgen and P. H. Richter, The Beauty of Fractals (New York: Springer-Verlag, 1986). 

4. Heinz-Otto Peitgen and Dietmar Saupe, The Science of Fractal Images (New York: Springer-Verlag, 1988). 


Exercise Set 10.13 

1. The self-similar set in Figure Ex-1 has the sizes indicated. Given that its lower left comer is situated at the origin of the xy-plane, find the similitudes that 
determine the set. What is its Hausdorff dimension? Is it a fractal? 
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Figure Ex-1 


Answer: 

; 1 iH(5 , )=ln(4)/ln(||)= 1.888... 

2. Find the Hausdorff dimension of the self-similar set shown in Figure Ex-2. Use a mler to measure the figure and determine an approximate value of the scale factor 
s. What are the rotation angles of the similitudes determining this set? 
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Figure Ex-2 


Answer: 

s « .47; ^Fr(S) ~ ln(4) / ln(l / .47) = 1.8 .... Rotation angles: 0° (upper left); —90° (upper right); 180° (lower left); 180° (lower right) 

Each of the 12 self-similar sets in Figure Ex-3 results from three similitudes with scale factor of -i, and so all have Hausdorff dimension In 3 / In 2 = 1.584...- The 

rotation angles of the three similitudes are all multiples of 9Q°- Find these rotation angles for each set and express them as a triplet of integers (n\, «3), where 

is the corresponding integer multiple of 90° i n the order upper right, lower left, lower right. For example, the first set (the Sierpinski triangle) generates the 
triplet (0, 0, 0). 






















Answer: 

(0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 
4. For each of the self-similar sets in Figure Ex-4, find: 

(i) the scale factor s of the similitudes describing the set; 

(ii) the rotation angles 0 of all similitudes describing the set (all rotation angles are multiples of 90°); and 

(iii) the Hausdorff dimension of the set. 

Which of the sets are fractals and why? 


(«) 


(*) 


(c) 


Figure Ex-4 


(d) 


Answer: 

(a) (i) s = -i; (ii) all rotation angles are 0°; (iii) d}j(S) = M7) / M3) = 1.771 . ... This set is a fractal. 

(b) (i) s — i; (ii) all rotation angles are 180°; (iii) d}j(S) = M3) / M2) = 1-584 .... This set is a fractal. 

(c) (i) s — i; (ii) rotation angles: ^90° (top); 180° (lower left); 180° (lower right); (iii) d}j(S) = ln(3) /ln(2) = 1.584 . ... This set is a fractal. 

(d) (i) s = 1; (ii) rotation angles: 90° (upper left); 180° (upper right); 180° (lower right); (iii) dfj(S) = ln(3) / ln(2) = 1.584 .... This set is a fractal. 

5. Show that of the four affine transformations shown in Figure 10.13.22, only the transformation T^ is a similitude. Determine its scale factor s and rotation angle Q. 


Answer: 






s = .85O9...,0 = - 2. 69 ... 


6. Find the coordinates of the tip of the fern in Figure 10.13.22. [Hint: The transformation Tj maps the tip of the fern to itself.] 

Answer: 

(0.766, 0.996) rounded to three decimal places 

7. The square in Figure 10.13.7a was expressed as the union of 4 nonoverlapping squares as in Figure 10.13.76. Suppose that it is expressed instead as the union of 
16 nonoverlapping squares. Verify that its Hausdorff dimension is still 2, as determined by Equation 2. 

Answer: 

^(5)=ln(16)/ln(4) = 2 

8. Show that the four similitudes 
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express the unit square as the union of four overlapping squares. Evaluate the right-hand side of Equation 2 for the values of k and s determined by these 
similitudes, and show that the result is not the correct value of the Hausdorff dimension of the unit square. [Note: This exercise shows the necessity of the 
nonoverlapping condition in the definition of a self-similar set and its Hausdorff dimension.] 


= 4.818.. 


Answer: 

ln(4)/lng) = ^ 

9. All of the results in this section can be extended to R n . Compute the Hausdorff dimension of the unit cube in (see Figure Ex-9). Given that the topological 
dimension of the unit cube is 3, determine whether it is a fractal. [Hint: Express the unit cube as the union of eight smaller congruent nonoverlapping cubes.] 



Figure Ex-9 


Answer: 

= ln(8) / ln(2) = 3; the cube is not a fractal. 

10. The set in R^ in Figure Ex-10 is called the Menger sponge. It is a self-similar set obtained by drilling out certain square holes from the unit cube. Note that each 
face of the Menger sponge is a Sierpinski carpet and that the holes in the Sierpinski carpet now run all the way through the Menger sponge. Determine the values 
of k and s for the Menger sponge and find its Hausdorff dimension. Is the Menger sponge a fractal? 
































Answer: 


k = 20; s = d H {S) = ln(20) /ln(3) = 2 726.. 
11. The two similitudes 


and 


; the set is a fractal. 
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determine a fractal known as the Cantor set. Starting with the unit square region U as an initial set, sketch the first four sets that Algorithm 1 determines. Also, 
find the Hausdorff dimension of the Cantor set. (This famous set was the first example that Hausdorff gave in his 1919 paper of a set whose Hausdorff dimension 
is not equal to its topological dimension.) 


Answer: 


Initial set 


First iterate 

Second iterate 

Third iterate 
Fourth iterate 

=ln(2) / ln(3) = 0.6309... 

12 . Compute the areas of the sets £q, S\, S 2 , S 3 , and S 4 in Figure 11.13.15. 

Answer: 

Area ofS’o = 1; area of S'] = j = 0.888...; area of Sj = = 0.790...; area of £3 = = 0.702...; area of S 4 = = 0.624... 

Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematica, Maple, Derive, or Mathcad, but it may 
also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you 
have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 













Tl. Use similitudes of the form 


ta 


to show that the Menger sponge (see Exercise 10) is the set S satisfying 
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for appropriately chosen similitudes Tj (for i= 1, 2, 3,20). Determine these similitudes by determining the collection of 3 x 1 matrices 

IT*, 


for i = 1, 2, 3,..., 20 


T2. Generalize the ideas involved in the Cantor set (in Z? 1 ), the Sierpinski carpet (in $}), and the Menger sponge (in £ 3 ) to R n by considering the set S satisfying 

S= U Tj(S) 

2=1 


with 
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1 0 0 ... 0 

0 1 0 ... 0 

0 0 1 ... 0 

0 0 0 ... 1 


12 1 

where each afo equals 0, -j, or and no two of them ever equal -j at the same time. Use a computer to construct the set 



"^1j 1 


<*2i 


a 3i 




fori = 1, 2, 3, m n 


thereby determining the value of m n for « = 2, 3, 4. Then develop an expression for m n . 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.14 Chaos 

In this section we use a map of the unit square in the xy- plane onto itself to describe the concept of a chaotic mapping. 


Prerequisites 

Geometry of Linear Operators on r} (Section 4.11) 

Eigenvalues and Eigenvectors 

Intuitive Understanding of Limits and Continuity 


Chaos 

The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and James Yorke in a paper entitled “Period 
Three Implies Chaos.” The term is now used to describe the behavior of certain mathematical mappings and physical phenomena 
that at first glance seem to behave in a random or disorderly fashion but actually have an underlying element of order (examples 
include random-number generation, shuffling cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of 
Jupiter, and deviations in the orbit of Pluto). In this section we discuss a particular chaotic mapping called Arnold's cat map , after 
the Russian mathematician Vladimir I. Arnold who first described it using a diagram of a cat. 


Arnold's Cat Map 


To describe Arnold's cat map, we need a few ideas about modular arithmetic. If x is a real number, then the notation * mod 1 
denotes the unique number in the interval [0, 1) that differs from x by an integer. For example, 

2.3 mod 1 = 0.3, 0.9 mod 1 = 0.9, - 3.7 mod 1 = 0.3, 2.0 mod 1 = 0 

Note that if x is a nonnegative number, then x mod 1 is simply the fractional part of x. If (*, y) is an ordered pair of real numbers, 
then the notation (*, y) mod 1 denotes (x mod 1 ,y mod 1). For example, 

(2.3, -7.9) modi = (0.3, 0.1) 

Observe that for every real number x, the point x mod 1 lies in the unit interval [0, 1) and that for every ordered pair (*, y ), the 
point (x,y) mod 1 lies in the unit square 

S = { (x, y) |0 < x < 1, 0 <y < 1} 

Also observe that the upper boundary and the right-hand boundary of the square are not included in S. 


Arnold's cat map is the transformation —► R 2 defined by the formula 

f: (x, y) -► (x + y, x 4= 2 y) mod 1 


or, in matrix notation, 
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To understand the geometry of Arnold's cat map, it is helpful to write 1 in the factored form 
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which expresses Arnold's cat map as the composition of a shear in the x-direction with factor 1, followed by a shear in the 
y-direction with factor 1. Because the computations are performed mod 1, r maps all points of r} into the unit square S. 
















We will illustrate the effect of Arnold's cat map on the unit square S, which is shaded in Figure 10.14.1a and contains a picture of 
a cat. It can be shown that it does not matter whether the mod 1 computations are carried out after each shear or at the very end. 
We will discuss both methods, first performing them at the end. The steps are as follows: 

Step 1 Shear in the x-direction with factor 1 (Figure 10.14.1Z?): 

O.y) — O+y.y) 

or in matrix notation 

r*+.y- 
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[: :R-r 


Step 2 Shear in the y-direction with factor 1 (Figure 10.14.1c): 

0,y) -»(*.*+y) 

or, in matrix notation, 


Step 3 Reassembly into S (Figure 10.14.1J): 


[1 


0,y) —► O.y) mod 1 


The geometric effect of the mod 1 arithmetic is to break up the parallelogram in Figure 10.14.1c and reassemble the pieces of S as 
shown inFigure 10.14.1J. 
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Figure 10.14.1 
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For computer implementation, it is more convenient to perform the mod 1 arithmetic at each step, rather than at the end. With this 
approach there is a reassembly at each step, but the net effect is the same. The steps are as follows: 

Step 1 Shear in the x-direction with factor 1, followed by a reassembly into S (Figure 10.14.2Z?): 

(x,y) —► mod 1 

Step 2 Shear in the y-direction with factor 1, followed by a reassembly into S (Figure 10.14.2c): 

(x,y) —► (x, x +y) mod 1 



Figure 10.14.2 















































Repeated Mappings 


Chaotic mappings such as Arnold's cat map usually arise in physical models in which an operation is performed repeatedly. For 
example, cards are mixed by repeated shuffles, paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal 
changes, and so forth. Thus, we are interested in examining the effect on S of repeated applications (or iterations) of Arnold's cat 
map. Figure 10.14.3, which was generated on a computer, shows the effect of 25 iterations of Arnold's cat map on the cat in the 
unit square S. Two interesting phenomena occur: 

The cat returns to its original form at the 25th iteration. 

At some of the intermediate iterations, the cat is decomposed into streaks that seem to have a specific direction. 

Much of the remainder of this section is devoted to explaining these phenomena. 







Figure 10.14.3 


Periodic Points 

Our first goal is to explain why the cat in Figure 10.14.3 returns to its original configuration at the 25th iteration. For this purpose 
it will be helpful to think of a picture in the xy-plane as an assignment of colors to the points in the plane. For pictures generated 
on a computer screen or other digital device, hardware limitations require that a picture be broken up into discrete squares, called 
pixels. For example, in the computer-generated pictures in Figure 10.14.3 the unit square S is divided into a grid with 101 pixels 
on a side for a total of 10,201 pixels, each of which is black or white (Figure 10.14.4). An assignment of colors to pixels to create 


a picture is called a pixel map. 



Figure 10.14.4 

As shown in Figure 10.14.5, each pixel in S can be assigned a unique pair of coordinates of the form {ml 101, n l 101) that 
identifies its lower left-hand corner, where m and n are integers in the range 0, 1,2,..., 100. We call these points pixel points 
because each such point identifies a unique pixel. Instead of restricting the discussion to the case where S is subdivided into an 
array with 101 pixels on a side, let us consider the more general case where there are p pixels per side. Thus, each pixel map in S 
consists of p pixels uniformly spaced 1 / p units apart in both the x- and the y-directions. The pixel points in S have coordinates 
of the form {m l p,nl p), where m and n are integers ranging from 0 to p — 1. 


(W m) 


Figure 10.14.5 



Under Arnold's cat map each pixel point of S is transformed into another pixel point of S. To see why this is so, observe that the 
image of the pixel point {m l p f n l p) under U is given in matrix form by 
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The ordered pair {{m + «) Ip, + 2 n) i p) mod 1 is of the form (, m ! I p,n r t p), where m l and n r lie in the range 
0, 1, 2,— 1. Specifically, m ! and n* are the remainders when and m -\-2n are divided by p , respectively. 

Consequently, each point in S of the form {m f p,n / p) is mapped onto another point of the same form. 


( 2 ) 


2 

Because Arnold's cat map transforms every pixel point of S into another pixel point of S , and because there are only p different 

2 

pixel points in S , it follows that any given pixel point must return to its original position after at most p iterations of Arnold's cat 
map. 


EXAMPLE 1 Using Formula 2 


If p = 76, then 2 becomes 
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(verify). Because the point returns to its initial position on the ninth application of Arnold’s cat map (but no sooner), 
the point is said to have period 9, and the set of nine distinct iterates of the point is called a 9-cycle. Figure 10.14.6 
shows this 9-cycle with the initial point labeled 0 and its successive iterates labeled accordingly. 



Figure 10.14.6 


In general, a point that returns to its initial position after n applications of Arnold’s cat map, but does not return with fewer than n 
applications, is said to have period n , and its set of n distinct iterates is called an n-cycle. Arnold's cat map maps (0, 0) into 
(0, 0), so this point has period 1. Points with period 1 are also called fixed points. We leave it as an exercise (Exercise 11) to 
show that (0, 0) is the only fixed point of Arnold's cat map. 


Period Versus Pixel Width 

If Pi and Pj are points with periods <71 and 72, respectively, then Py returns to its initial position in 71 iterations (but no sooner), 
and P2 returns to its initial position in 72 iterations (but no sooner); thus, both points return to their initial positions in any number 

'y 

of iterations that is a multiple of both 71 and 72. In general, for a pixel map with p pixel points of the form (m f p, n I p), we let 

riO?) denote the least common multiple of the periods of all the pixel points in the map [i.e., II(7?) is the smallest integer that is 
divisible by all of the periods]. It follows that the pixel map will return to its initial configuration in II(/?) iterations of Arnold's 
cat map (but no sooner). For this reason, we call n(/>) the period of the pixel map. In Exercise 4 we ask you to show that if 
p = 101, then all pixel points have period 1, 5, or 25, son(lOl) = 25. This explains why the cat in Figure 10.14.3 returned to 
its initial configuration in 25 iterations. 

Figure 10.14.7 shows how the period of a pixel map varies with p. Although the general tendency is for the period to increase as p 
increases, there is a surprising amount of irregularity in the graph. Indeed, there is no simple function that specifies this 
relationship (see Exercise 1). 
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Figure 10.14.7 



Although a pixel map with p pixels on a side does not return to its initial configuration until FI(/?) iterations have occurred, 
various unexpected things can occur at intermediate iterations. For example, Figure 10.14.8 shows a pixel map with p = 250 of 
the famous Hungarian-American mathematician John von Neumann. It can be shown that 11(250) =750; hence, the pixel map 
will return to its initial configuration after 750 iterations of Arnold's cat map (but no sooner). However, after 375 iterations the 
pixel map is turned upside down, and after another 375 iterations (for a total of 750) the pixel map is returned to its initial 
configuration. Moreover, there are so many pixel points with periods that divide 750 that multiple ghostlike images of the original 
likeness occur at intermediate iterations; at 195 iterations numerous miniatures of the original likeness occur in diagonal rows. 




The Tiled Plane 






































































Our next objective is to explain the cause of the linear streaks that occur in Figure 10.14.3. For this purpose it will be helpful to 
view Arnold's cat map another way. As defined, Arnold's cat map is not a linear transformation because of the mod 1 arithmetic. 
However, there is an alternative way of defining Arnold's cat map that avoids the mod 1 arithmetic and results in a linear 
transformation. For this purpose, imagine that the unit square S with its picture of the cat is a “tile,” and suppose that the entire 
plane is covered with such tiles, as in Figure 10.14.9. We say that the xy-plane has been tiled with the unit square. If we apply the 
matrix transformation in 1 to the entire tiled plane without performing the mod 1 arithmetic, then it can be shown that the portion 
of the image within S will be identical to the image that we obtained using the mod 1 arithmetic (Figure 10.14.9). In short, the 
tiling results in the same pixel map in S as the mod 1 arithmetic, but in the tiled case Arnold's cat map is a linear transformation. 
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Figure 10.14.9 


It is important to understand, however, that tiling and mod 1 arithmetic produce periodicity in different ways. If a pixel map in S 
has period n , then in the case of mod 1 arithmetic, each point returns to its original position at the end of n iterations. In the case 
of tiling, points need not return to their original positions; rather, each point is replaced by a point of the same color at the end of n 
iterations. 


Properties of Arnold's Cat Map 


To understand the cause of the streaks in Figure 10.14.3, think of Arnold's cat map as a linear transformation on the tiled plane. 
Observe that the matrix 


C = 


1 1 
1 2 


that defines Arnold's cat map is symmetric and has a determinant of 1. The fact that the determinant is 1 means that multiplication 
by this matrix preserves areas; that is, the area of any figure in the plane and the area of its image are the same. This is also true 
for figures in S in the case of mod 1 arithmetic, since the effect of the mod 1 arithmetic is to cut up the figure and reassemble the 
pieces without any overlap, as shown in Figure 10.14.1 d. Thus, in Figure 10.14.3 the area of the cat (whatever it is) is the same as 
the total area of the blotches in each iteration. 


The fact that the matrix is symmetric means that its eigenvalues are real and the corresponding eigenvectors are perpendicular. We 
leave it for you to show that the eigenvalues and corresponding eigenvectors of C are 

A t = 3+ 2 ^ = 2.6180..., A 2 = 3 ~^ = 0.3819..., 
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r 1 ‘ 


[- 1-^1 


—1.6180..." 

1 + 1/5 

1.6180... 

, V 2 = 
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— 

1 

2 

L J 
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For each application of Arnold's cat map, the eigenvalue Ai causes a stretching in the direction of the eigenvector v\ by a factor 

of 2.6180..and the eigenvalue A 2 causes a compression in the direction of the eigenvector V 2 by a factor of 0.3819_ Figure 

10.14.10 shows a square centered at the origin whose sides are parallel to the two eigenvector directions. Under the above 
mapping, this square is deformed into the rectangle whose sides are also parallel to the two eigenvector directions. The area of the 




















square and rectangle are the same. 



Figure 10.14.10 


To explain the cause of the streaks in Figure 10.14.3, consider S to be part of the tiled plane, and let p be a point of S with period 
n. Because we are considering tiling, there is a point q in the plane with the same color as p that on successive iterations moves 
toward the position initially occupied by p , reaching that position on the nth iteration. This point is q = (.4 -1 ) p =A~ n p, since 

A\ = A”{A-”v)= V 

Thus, with successive iterations, points of S flow away from their initial positions, while at the same time other points in the plane 
(with corresponding colors) flow toward those initial positions, completing their trip on the final iteration of the cycle. Figure 

10.14.11 illustrates this in the case where n = 4 , q = | — -j, j, and p = ^4 4 q = -jj. Note that 

p mod 1 = q mod 1 = j, so both points occupy the same positions on their respective tiles. The outgoing point moves in 

the general direction of the eigenvector v \ , as indicated by the arrows in Figure 10.14.11, and the incoming point moves in the 
general direction of eigenvector V 2 . It is the “flow lines” in the general directions of the eigenvectors that form the streaks in 
Figure 10.14.3. 
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Figure 10.14.11 
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Nonperiodic Points 

Thus far we have considered the effect of Arnold’s cat map on pixel points of the form [m / p,n / p) for an arbitrary positive 
integer p. We know that all such points are periodic. We now consider the effect of Arnold's cat map on an arbitrary point (a, b) 
in S. We classify such points as rational if the coordinates a and b are both rational numbers, and irrational if at least one of the 
coordinates is irrational. Every rational point is periodic, since it is a pixel point for a suitable choice of p. For example, the 
rational point (rj / s\, t ^ 2 ) can written as ^ 2^1 / 5 1 5 2 )? so it i s a pixel point with p = s\£2- It can be shown 

(Exercise 13) that the converse is also true: Every periodic point must be a rational point. 





















































It follows from the preceding discussion that the irrational points in S are nonperiodic, so that successive iterates of an irrational 
point j/q) in S must all be distinct points in S. Figure 10.14.12, which was computer generated, shows an irrational point and 
selected iterates up to 100,000. For the particular irrational point that we selected, the iterates do not seem to cluster in any 
particular region of S ; rather, they appear to be spread throughout S , becoming denser with successive iterations. 




Figure 10.14.12 


The behavior of the iterates in Figure 10.14.12 is sufficiently important that there is some terminology associated with it. We say 
that a set D of points in S is dense in S if every circle centered at any point of S encloses points of D , no matter how small the 
radius of the circle is taken (Figure 10.14.13). It can be shown that the rational points are dense in S and the iterates of most (but 
not all) of the irrational points are dense in S. 
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Definition of Chaos 

We know that under Arnold's cat map, the rational points of S are periodic and dense in S and that some but not all of the 
irrational points have iterates that are dense in S. These are the basic ingredients of chaos. There are several definitions of chaos in 























current use, but the following one, which is an outgrowth of a definition introduced by Robert L. Devaney in 1986 in his book An 
Introduction to Chaotic Dynamical Systems (Benjamin/Cummings Publishing Company), is most closely related to our work. 

r n 


DEFINITION 1 

A mapping T of S onto itself is said to be chaotic if: 

(i) S contains a dense set of periodic points of the mapping T. 

(ii) There is a point in S whose iterates under T are dense in S. 


L J 

Thus Arnold's cat map satisfies the definition of a chaotic mapping. What is noteworthy about this definition is that a chaotic 
mapping exhibits an element of order and an element of disorder—the periodic points move regularly in cycles, but the points 
with dense iterates move irregularly, often obscuring the regularity of the periodic points. This fusion of order and disorder 
characterizes chaotic mappings. 


Dynamical Systems 

Chaotic mappings arise in the study of dynamical systems. Informally stated, a dynamical system can be viewed as a system that 
has a specific state or configuration at each point of time but that changes its state with time. Chemical systems, ecological 
systems, electrical systems, biological systems, economic systems, and so forth can be looked at in this way. In a discrete-time 
dynamical system , the state changes at discrete points of time rather than at each instant. In a discrete-time chaotic dynamical 
system , each state results from a chaotic mapping of the preceding state. For example, if one imagines that Arnold's cat map is 
applied at discrete points of time, then the pixel maps in Figure 10.14.3 can be viewed as the evolution of a discrete-time chaotic 
dynamical system from some initial set of states (each point of the cat is a single initial state) to successive sets of states. 

One of the fundamental problems in the study of dynamical systems is to predict future states of the system from a known initial 
state. In practice, however, the exact initial state is rarely known because of errors in the devices used to measure the initial state. 
It was believed at one time that if the measuring devices were sufficiently accurate and the computers used to perform the 
iteration were sufficiently powerful, then one could predict the future states of the system to any degree of accuracy. But the 
discovery of chaotic systems shattered this belief because it was found that for such systems the slightest error in measuring the 
initial state or in the computation of the iterates becomes magnified exponentially, thereby preventing an accurate prediction of 
future states. Let us demonstrate this sensitivity to initial conditions with Arnold's cat map. 

Suppose that Pq is a point in the xy-plane whose exact coordinates are (0.77837, 0.70904). A measurement error of 0.00001 is 
made in the y-coordinate, such that the point is thought to be located at (0.77837, 0.70905), which we denote by Qq. Both Pq 
and Qq are pixel points with p = 100, 000 (why?), and thus, since 11(100, 000) =75, 000, both return to their initial positions 
after 75,000 iterations. In Figure 10.14.14 we show the first 50 iterates of Pq under Arnold's cat map as crosses and the first 50 
iterates of go as circles. Although Pq and go are close enough that their symbols overlap initially, only their first eight iterates 
have overlapping symbols; from the ninth iteration on their iterates follow divergent paths. 
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Figure 10.14.14 


It is possible to quantify the growth of the error from the eigenvalues and eigenvectors of Arnold's cat map. For this purpose we 
will think of Arnold's cat map as a linear transformation on the tiled plane. Recall from Figure 10.14.10 and the related discussion 
that the projected distance between two points in S in the direction of the eigenvector v \ increases by a factor of2.6180...( = Ai) 
with each iteration (Figure 10.14.15). After nine iterations this projected distance increases by a factor of 

(2.6180...) 9 = 5777 . 99 ..., and with an initial error of roughly 1 / 100 , 000 in the direction of v \, this distance is 0.05777..or 

about -jy the width of the unit square S. After 12 iterations this small initial error grows to (2.6180...) 12 / 100, 000 = 1.0368..., 

which is greater than the width of S. Thus, we completely lose track of the true iterates within S after 12 iterations because of the 
exponential growth of the initial error. 
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Figure 10.14.15 

Although sensitivity to initial conditions limits the ability to predict the future evolution of dynamical systems, new techniques 
are presently being investigated to describe this future evolution in alternative ways. 


Exercise Set 10.14 

1. In a journal article [F. J. Dyson and H. Falk, “Period of a Discrete Cat Mapping,” The American Mathematical Monthly, 99 
(August-September 1992), pp. 603-614] the following results concerning the nature of the function n(;>) were established: 

(i) Yl(p) = 3 p if and only if p = 2 • 5* for k = 1, 2,.... 

(ii) U(p) = 2 p if and only if;? = 5* for lc= 1, 2,... or p = 6 ■ 5* for lc = 0, 1, 2,.... 

(iii) n(/>) < \ 2p fl for all other choices of p. 

Find n(250), 11(25), 11(125), FI(30), n(10), n(50), 11(3750), 11(6), and 11(5). 

Answer: 

ri(250) = 750, n(25) = 50, n(125) = 250,11(30) = 60, FI(10) = 30,11(50) = 150,11(3750) = 7500, n(6) = 12, 
n(5) = io 

2. Find all the n-cycles that are subsets of the 36 points in S of the form (m I 6, n f 6) with m and n in the range 0, 1,2, 3, 4, 5. 
Then find FI(6). 

Answer: 

{(!-“)'( 
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3. (Fibonacci Shift-Register Random-Number Generator) A well-known method of generating a sequence of “pseudorandom” 
integers xq, x\, x% * 3 , ...in the interval from 0 to p — 1 is based on the following algorithm: 

(i) Pick any two integers xq and x i from the range 0, 1, 2,.. p — 1. 

(ii) Set = (x„ + x„-\ ) mod p for n = 1, 2,. ... 

Here x mod p denotes the number in the interval from 0 to p — 1 that differs from x by a multiple of p. For example, 35 mod 
9 = 8 (because 8 = 35 - 3 • 9 ); 36 mod 9 = 0 (because 0 = 36-4-9); and -3 mo d 9 = 6 (because 6 = -3 + 1 - 9 ). 

(a) Generate the sequence of pseudorandom numbers that results from the choices p = 15, xq = 3, and x\ = 7 until the 
sequence starts repeating. 


(b) Show that the following formula is equivalent to step (ii) of the algorithm: 


*M + 1 


"1 

r 

x yi — 1 

x n+ 2 


1 

2 _ 

x n 


mod p for n = 1, 2, 3,... 


(c) Use the formula in part (b) to generate the sequence of vectors for the choices p = 21, xq = 5, and x\ = 5 until the 
sequence starts repeating. 


Answer: 


(a) 3, 7, 10, 2, 12, 14, 11, 10, 6 , 1, 7, 8 , 0, 8 , 8 , 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6 , 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, 
(c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6 ), (10, 16), (5, 0), (5, 5),.. 


If we take p = 1 and pick xq and x i from the interval [0, 1), then the above random-number generator produces 
pseudorandom numbers in the interval [0, 1). The resulting scheme is precisely Arnold's ct map. Furthermore, if we eliminate 
the modular arithmetic in the algorithm and take xq = x\ = 1, then the resulting sequence of integers is the famous Fibonacci 
sequence, 1, 1, 2, 3, 5, 8 , 13, 21, 34, 55, 89,..., in which each number after the first two is the sum of the preceding two 
numbers. 


For C = 


1 1 
1 2 


it can be verified that 


C 25 


7,778,742,049 12,586,269,025 
12,586,269,025 20,365,011,074 


It can also be verified that 12,586,269,025 is divisible by 101 and that when 7,778,742,049 and 20,365,011,074 are divided by 
101 , the remainder is 1 . 


(a) Show that every point in S of the form {mi 101, n i 101) returns to its starting position after 25 iterations under Arnold's 
cat map. 

(b) Show that every point in S of the form {mi 101, n i 101) has period 1, 5, or 25. 

( c ) Show that the point | ^ , 0 J has period greater than 5 by iterating it five times. 

(d) Show that 11(101) = 25. 


Answer: 


(c) 


The first five iterates of (a^-. o) are (ajip (-jjjp ajfp), (ffp -jjjj-), (ajg-. and (a^p 


Show that for the mapping T:S—*S defined by T(x, 7 ) = y j mod 1 , every point in S is a periodic point. Why does 

this show that the mapping is not chaotic? 

6 . An Anosov automorphism on R 2 is a mapping from the unit square S onto S of the form 

























in which (i) a , Z?, c, and d are integers, (ii) the determinant of the matrix is ± ], and (iii) the eigenvalues of the matrix do not 
have magnitude 1. It can be shown that all Anosov automorphisms are chaotic mappings. 

(a) Show that Arnold's cat map is an Anosov automorphism. 

(b) Which of the following are the matrices of an Anosov automorphism? 

1 0 
0 1 


'0 f 


'3 2' 

1 0 . 

7 

_1 1 _ 

'5 7' 


'6 2 ' 

2 3_ 

7 

5 2 


(c) Show that the following mapping of S onto S is not an Anosov automorphism. 

1 


PH-? JPF 


What is the geometric effect of this transformation on *S? Use your observation to show that the mapping is not a chaotic 
mapping by showing that all points in S are periodic points. 


Answer: 

(b) 


The matrices of Anosov automorphisms are 


3 2 

1 1 


and 


5 7 
2 3 


(c) The transformation affects a rotation of S through 90 in the clockwise direction. 

7. Show that Arnold's cat map is one-to-one over the unit square S and that its range is S. 

8 . Show that the inverse of Arnold's cat map is given by 

r _ 1 (x,>>) = (2x-y, -x+/)modl 

9. Show that the unit square S can be partitioned into four triangular regions on each of which Arnold's cat map is a 
transformation of the form 


[;]-[: ®w 


where a and b need not be the same for each region. [Hint: Find the regions in S that map onto the four shaded regions of the 
parallelogram in Figure 10.14. Id.] 


Answer: 

( 0 , 1 ) 

( 0 . 1 / 2 ) 11 
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( 0 , 0 ) 

In region I: 
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-1 

; in region III: 

_b_ 
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; in region IV: 
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10. If (*Q f j/q) is a point in S and y n ) is its rcth iterate under Arnold's cat map, show that 

mod 1 
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This result implies that the modular arithmetic need only be performed once rather than after each iteration. 

11. Show that (0, 0) is the only fixed point of Arnold's cat map by showing that the only solution of the equation 

mod 1 
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with 0 < < 1 and 0<y$< 1 is xq =70 = 0- [Hint: For appropriate nonnegative integers, r and 5 , we can write 


*0 

70 


1 1 
1 2 


*0 

70 


]-H 


for the preceding equation.] 

12. Find all 2-cycles of Arnold's cat map by finding all solutions of the equation 

n 2 r 


[»]-! 


1 


with 0 < xq < 1 and 0 <70 < 1 • [Hint: For appropriate nonnegative integers, r and s, we can write 


r-° 1 = r 2 3ir*oi_|>i 

[ 7 oJ “[3 5JL70J [s] 


for the preceding equation.] 


Answer: 



form one 2-cycle, and 



form another 2-cycle. 


13. Show that every periodic point of Arnold's cat map must be a rational point by showing that for all solutions of the equation 


"*o" 
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the numbers *0 and y q are quotients of integers. 

14. Let T be the Arnold's cat map applied five times in a row; that is, T = Figure Ex-14 represents four successive mappings 
of T on the first image, each image having a resolution of 101x101 pixels. The fifth mapping returns to the first image 
because this cat map has a period of 25. Explain how you might generate this particular sequence of images. 



Figure Ex-14 


Answer: 

Begin with alQlxlOl array of white pixels and add the letter 4 A’ in black pixels to it. Apply the mapping to this image, 
which will scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. 
Apply the mapping again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure 
with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters 
‘B’ through ‘E’ scattered in the background. 

Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematical 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


Tl. The methods of Exercise 4 show that for the cat map, is the smallest integer satisfying the equation 


1 1 
1 2 


mod p = 


1 0 
0 1 



























This suggests that one way to determine FIO?) is to compute 

| nodp 

starting with ^ = 1 and stopping when this produces the identity matrix. Use this idea to compute n(/>) for p = 2, 3, 
Compare your results to the formulas given in Exercise 1, if they apply. What can you conjecture about 


1 1 | 2 


mod p 


when 110?) is even? 

T2. The eigenvalues and eigenvectors for the cat map matrix 




v l= 1 4- v5 • v 2 — 1 — ^5 

2 2 


Using these eigenvalues and eigenvectors, we can define 


3 + y 5 
2 


W5 

2 


and P= I + 1/5 l-i/5 


and write Q — PDP *; hence, Q n — P£) n p *. Use a computer to show that 

(«) 00 

C"= 11 12 

M M 

c 2l c 22 


(ri)_( l + |/5 \( 3-/5 f /l-/5 V 3 + |/S 




>)_>)_ 1 p+ 

c 12 ~ c 21 - rr\ - 


" (3-J5\ n 


How can you use these results and your conclusions in Exercise T1 to simplify the method for computing n(/>)? 
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10.15 Cryptography 

In this section we present a method of encoding and decoding messages. We also examine modular arithmetic and show 
how Gaussian elimination can sometimes be used to break an opponent's code. 


Prerequisites 

Matrices 

Gaussian Elimination 

Matrix Operations 

Linear Independence 

Linear Transformations (Section 4.9) 


Ciphers 

The study of encoding and decoding secret messages is called cryptography. Although secret codes date to the earliest days 
of written communication, there has been a recent surge of interest in the subject because of the need to maintain the 
privacy of information transmitted over public lines of communication. In the language of cryptography, codes are called 
ciphers , uncoded messages are called plaintext , and coded messages are called ciphertext. The process of converting from 
plaintext to ciphertext is called enciphering , and the reverse process of converting from ciphertext to plaintext is called 
deciphering. 

The simplest ciphers, called substitution ciphers , are those that replace each letter of the alphabet by a different letter. Lor 
example, in the substitution cipher 

Plain ABCDELGHI J KLMNOPQRS TUVWXYZ 
Cipher DELGHI JKLMNOPQRSTUVWXYZABC 

the plaintext letter A is replaced by D , the plaintext letter B by E , and so forth. With this cipher the plaintext message 

ROME WAS NOT BUILT IN A DAY 

becomes 

URPH ZDV QRWEXLOWLQ D GDB 


Hill Ciphers 

A disadvantage of substitution ciphers is that they preserve the frequencies of individual letters, making it relatively easy to 
break the code by statistical methods. One way to overcome this problem is to divide the plaintext into groups of letters and 
encipher the plaintext group by group, rather than one letter at a time. A system of cryptography in which the plaintext is 
divided into sets of n letters, each of which is replaced by a set of n cipher letters, is called a polygraphic system. In this 
section we will study a class of polygraphic systems based on matrix transformations. [The ciphers that we will discuss are 
called Hill ciphers after Lester S. Hill, who introduced them in two papers: “Cryptography in an Algebraic Alphabet,” 
American Mathematical Monthly, 36 (June-July 1929), pp. 306-312; and “Concerning Certain Linear Transformation 
Apparatus of Cryptography,” American Mathematical Monthly, 38 (March 1931), pp. 135-154.] 


In the discussion to follow, we assume that each plaintext and ciphertext letter except Z is assigned the numerical value that 
specifies its position in the standard alphabet (Table 1). For reasons that will become clear later, Z is assigned a value of 
zero. 


Table 1 


A B C D 

E 

F G 

H 

/ 

J 

K 

L M 

iV 0 

P 

Q 

R 

S T V 

V W 

X 

Y 

Z 

12 3 4 

5 

6 7 

8 

9 

10 

11 

12 13 

14 15 

16 

17 

18 

19 20 21 

22 23 

24 

25 

0 


In the simplest Hill ciphers, successive pairs of plaintext are transformed into ciphertext by the following procedure: 


Step 1 Choose a 2 x 2 matrix with integer entries 


*11 *12 
*21 *22 


to perform the encoding. Certain additional conditions on A will be imposed later. 

Step 2 Group successive plaintext letters into pairs, adding an arbitrary “dummy” letter to fill out the last pair if the 
plaintext has an odd number of letters, and replace each plaintext letter by its numerical value. 

Step 3 Successively convert each plaintext pair p \P2 into a column vector 

~P\ 


and form the product Ap. We will call p a plaintext vector and Ap the corresponding ciphertext vector. 
Step 4 Convert each ciphertext vector into its alphabetic equivalent. 


EXAMPLE 1 Hill Cipher of a Message 

Use the matrix 

"1 2 " 

.0 3 . 

to obtain the Hill cipher for the plaintext message 

l AM HIDING 


If we group the plaintext into pairs and add the dummy letter G to fill out the last pair, we obtain 

IA MR ID IN GG 

or, equivalently, from Table 1, 

91 13 8 94 9 14 77 

To encipher the pair IA, we form the matrix product 


"1 2' 

"9' 


'll' 

_° 3 . 

_1_ 


_ 3_ 


which, from Table 1, yields the ciphertext KC. 
To encipher the pair MH, we form the product 


'1 2' 

13 


'29' 

_° 3_ 

8 _ 


_24_ 


However, there is a problem here, because the number 29 has no alphabet equivalent (Table 1). To resolve 
this problem, we make the following agreement: 























Whenever an integer greater than 25 occurs , it will be 
replaced by the remainder that results when this 
integer is divided by 26 . 

Because the remainder after division by 26 is one of the integers 0, 1, 2,25, this procedure will always 
yield an integer with an alphabet equivalent. 

Thus, in 1 we replace 29 by 3, which is the remainder after dividing 29 by 26. It now follows from Table 1 
that the ciphertext for the pair MH is CX. 

The computations for the remaining ciphertext vectors are 


~\ 2 


'9' 


'17' 


.0 3 . 
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12 


'1 2' 

" 

9" 


'37' 


'11 

_o 3 . 


14 


42 

or 

16 

'1 2 


7 


'21' 


0 3 


7 


21 



These correspond to the ciphertext pairs QL , KP , and UU, respectively. In summary, the entire ciphertext 
message is 

KC CX QL KP UU 
which would usually be transmitted as a single string without spaces: 

KCCXQLKPUU 


Because the plaintext was grouped in pairs and enciphered by a 2 x 2 matrix, the Hill cipher in Example 1 is referred to as 
Hill 2-cipher. It is obviously also possible to group the plaintext in triples and encipher by a 3 x 3 matrix with integer 
entries; this is called a Hill 3-cipher. In general, for a Hill n-cipher , plaintext is grouped into sets of n letters and 
enciphered by an « x n matrix with integer entries. 


Modular Arithmetic 

In Example 1, integers greater than 25 were replaced by their remainders after division by 26. This technique of working 
with remainders is at the core of a body of mathematics called modular arithmetic. Because of its importance in 
cryptography, we will digress for a moment to touch on some of the main ideas in this area. 

In modular arithmetic we are given a positive integer m , called the modulus , and any two integers whose difference is an 
integer multiple of the modulus are regarded as “equal” or “equivalent” with respect to the modulus. More precisely, we 
make the following definition. 


DEFINITION 1 

If m is a positive integer and a and b are any integers, then we say that a is equivalent to b modulo m , written 

a = b (mod m) 


if a _ £ is an integer multiple of m. 






















EXAMPLE 2 Various Equivalences 


7 = 2 

(mod 5) 

19 = 3 

(mod 2) 

-1 = 25 

(mod 26) 

12 = 0 

(mod 4) 


For any modulus m it can be proved that every integer a is equivalent, modulo m , to exactly one of the integers 

0 , 1, 2 ,m — 1 

We call this integer the residue of a modulo m , and we write 

Z m = {0, 1, 2,w — 1} 

to denote the set of residues modulo m. 

If a is a nonnegative integer, then its residue modulo m is simply the remainder that results when a is divided by m. For an 
arbitrary integer a , the residue can be found using the following theorem. 


THEOREM 10.15.1 

For any integer a and modulus m , let 

R = remainder of 

J m 

Then the residue r of a modulo m is given by 

(R if 

r = )m — R if 

[o if 

EXAMPLE 3 Residues mod 26 

Find the residue modulo 26 of (a) 87, (b) —38, and (c) -26- 

Solution 

Dividing |871 = 87 by 26 yields a remainder of R = 9, so r = 9- Thus, 

87 = 9 (mod 26) 

») Dividing | — 38| = 38 by 26 yields a remainder of R = 12, so r = 26 — 12 = 14- Thus, 

-38=14 (mod 26) 

Dividing | — 261 = 26 by 26 yields a remainder of R = 0- Thus, 

-26 = 0 (mod 26) 


In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative inverse , denoted by a *, such that 


a> 0 

a < 0 

and 

R* 0 

a < 0 

and 

R = 0 


aa 1 = a = 1 

In modular arithmetic we have the following corresponding concept: 


DEFINITION 2 

If a is a number in Z m , then a number a -1 in Z m is called a reciprocal or multiplicative inverse of a modulo m if 
aa~^ =a~^a = 1 (mod m ). 


It can be proved that if a and m have no common prime factors, then a has a unique reciprocal modulo m ; conversely, if a 
and m have a common prime factor, then a has no reciprocal modulo m. 

EXAMPLE 4 Reciprocal of 3 mod 26 


The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime factors. This reciprocal 
can be obtained by finding the number x in Z 26 that satisfies the modular equation 

3x = 1 (mod 26) 


Although there are general methods for solving such modular equations, it would take us too far afield to 
study them. However, because 26 is relatively small, this equation can be solved by trying the possible 
solutions, 0 to 25, one at a time. With this approach we find that x = 9 is the solution, because 

3 ■ 9 = 27 = 1 (mod 26) 


Thus, 


3 -1 = 9 (mod 26) 


EXAMPLE 5 A Number with No Reciprocal mod 26 

The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime factor (see Exercise 

8 ). 


For future reference, in Table 2 we provide the following reciprocals modulo 26: 

Reciprocals Modulo 26 


a 

1 

3 

5 

7 

9 

11 

15 

17 

19 

21 

23 

25 

a~ l 

1 

9 

21 

15 

3 

19 

7 

23 

11 

5 

17 

25 


Deciphering 

Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, decipherment uses the inverse 
(mod 26) of the enciphering matrix. To be precise, if m is a positive integer, then a square matrix A with entries in Z m is 
said to be invertible modulo m if there is a matrix B with entries in Z m such that 

















Suppose now that 


AB = BA = / (mod m) 


<* 11 a \2 
a 2\ 22 


is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If 


P = 


PI 

P2 


( 1 ) 


is a plaintext vector, then 


c = (mod 26) 


is the corresponding ciphertext vector and 

P = j4 -1 c (mod 26) 


Thus, each plaintext vector can be recovered from the corresponding ciphertext vector by multiplying it on the left by 
(mod 26). 


In cryptography it is important to know which matrices are invertible modulo 26 and how to obtain their inverses. We now 
investigate these questions. 

In ordinary arithmetic, a square matrix A is invertible if and only if det(-d) * 0, or, equivalently, if and only if det(^4) has a 
reciprocal. The following theorem is the analog of this result in modular arithmetic. 


THEOREM 10.15.2 

A square matrix A with entries in Z m is invertible modulo m if and only if the residue of det(^4) modulo m has a 
reciprocal modulo m. 


Because the residue of det(^4) modulo m will have a reciprocal modulo m if and only if this residue and m have no common 
prime factors, we have the following corollary. 


COROLLARY 10.15.3 

A square matrix A with entries in Z m is invertible modulo m if and only if m and the residue of det(^4) modulo m 
have no common prime factors. 


Because the only prime factors of ^ = 26 are 2 and 13, we have the following corollary, which is useful in cryptography. 


COROLLARY 10.15.4 

A square matrix A with entries in Z26 is invertible modulo 26 if and only if the residue of det(-d) modulo 26 is not 
divisible by 2 or 13. 






We leave it for you to verify that if 


A = 


a b 
c d 


has entries in Z 26 and the residue of det(-d) =ad — be modulo 26 is not divisible by 2 or 13, then the inverse of A (mod 
26) is given by 


A~ l = (ad-bc)~' 


d -b 
—c a 


(mod 26) 


where (ad — be) 1 is the reciprocal of the residue of ad — be (mod 26). 

EXAMPLE 6 Inverse of a Matrix mod 26 

Find the inverse of 


A = 


5 6 
2 3 


modulo 26. 

Solution 

so from Table 2, 

Thus, from 2, 

As a check, 

Similarly, A~^A = I 


det (A)=ad-bc = 5- 3 — 6-2 = 3 


(ad — be ) -1 = 3 -1 = 9 (mod 26) 


= 9 


AA~ l = 


3 -6 
-2 5 


H- 


27 -54 
18 45 


1 24 
8 19 


'5 6' 

1 

^r 

i_ 

[53 234' 


o 

2 3_ 

00 

\o 

1 _ 

wn 

o 

CM 

r~ 


- 1 

o 


(mod 26) 


(mod 26) 


EXAMPLE 7 Decoding a Hill 2-Cipher 

Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6: 

GTNKGKDUSK 

From Table 1 the numerical equivalent of this ciphertext is 

7 20 14 11 7 11 4 21 19 11 

To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A (obtained in Example 6): 


( 2 ) 



















24 

19 

24 

19 

24 

19 

24 

19 

24 

19 


M ■ 

’487]_ r19' 
436J [20 

(mod 26) 

In] - 

278] _ T18" 
321J [ 9 

(mod 26) 

In] " 

' 27 i] nr 

265 J [ 5 

(mod 26) 

M ■ 

1 1 

■ ■ 

11 

1-1 

OO T— 
O CO 

in ^r 

1_1 

(mod 26) 

In] " 

"283] _ [23' 
361J [23 

(mod 26) 


From Table 1, the alphabet equivalents of these vectors are 

ST SI KE NO 

which yields the message 

STRIKE NOW 


ww 


Breaking a Hill Cipher 

Because the purpose of enciphering messages and information is to prevent “opponents” from learning their contents, 
cryptographers are concerned with the security of their ciphers—that is, how readily they can be broken (deciphered by 
their opponents). We will conclude this section by discussing one technique for breaking Hill ciphers. 

Suppose that you are able to obtain some corresponding plaintext and ciphertext from an opponent's message. For example, 
on examining some intercepted ciphertext, you may be able to deduce that the message is a letter that begins DEAR SIR. We 
will show that with a small amount of such data, it may be possible to determine the deciphering matrix of a Hill code and 
consequently obtain access to the rest of the message. 

It is a basic result in linear algebra that a linear transformation is completely determined by its values at a basis. This 
principle suggests that if we have a Hill ^-cipher, and if 

Pl»P2.-v Pn 

are linearly independent plaintext vectors whose corresponding ciphertext vectors 

are known, then there is enough information available to determine the matrix A and hence A * (mod m). 

The following theorem, whose proof is discussed in the exercises, provides a way to do this. 


Determining the Deciphering Matrix 

Let pi, P2> Pm be linearly independent plaintext vectors, and let cj, C 2 , - c„ be the corresponding ciphertext 
vectors in a Hill ^-cipher. If 

















p= 


Pi 

T 

P2 


T 

Pw 


is the yi x n matrix with row vectors pj , pj,p l n and if 


C = 


J 

c 2 


c T 
c n 


is the yi x n matrix with row vectors cj , cj, - cj, then the sequence of elementary row operations that reduces C 
to / transforms to (^4 “1) . 


This theorem tells us that to find the transpose of the deciphering matrix A 1 , we must find a sequence of row operations 
that reduces Ctol and then perform this same sequence of operations on P. The following example illustrates a simple 
algorithm for doing this. 

EXAMPLE 8 Using Theorem 10.15.5 


The following Hill 2-cipher is intercepted: 

IOSBTGXESPXHOPDE 

Decipher the message, given that it starts with the word DEAR. 

From Table 1, the numerical equivalent of the known plaintext is 

DE AR 
4 5 1 18 

and the numerical equivalent of the corresponding ciphertext is 

10 SB 

9 15 19 2 


so the corresponding plaintext and ciphertext vectors are 

'4' 


PI = 


P2 = 


1 

18 


«-> ci = 


«-»C2 = 


9 

15 

"19 

2 


We want to reduce 



to / by elementary row operations and simultaneously apply these operations to 


r n 



Pi 


"4 5" 

n r 


1 18 

P 2 




P = 




















1 

to obtain (A~^)' (the transpose of the deciphering matrix). This can be accomplished by adjoining P to the 

right of C and applying row operations to the resulting matrix [C|.P] until the left side is reduced to 7. The 

-1 7 

final matrix will then have the form [1 {A ) ]. The computations can be carried out as follows: 


9 15 
19 2 

1 45 
19 2 

1 19 
19 2 

1 19 

0 -359 

1 19 
0 5 

1 19 
0 1 

1 19 
0 1 

1 0 
0 1 

1 0 
0 1 


4 5 

1 18 

12 15' 

1 18 

12 15' 

1 18 

12 15 

-227 -267 

12 15' 

7 19 

12 15' 

147 399 

12 15' 

17 9 

-311 -156' 

17 9 

1 O' 

17 9 


«— We formed the matrix [C |.P ] . 

<— We multiplied the first row by 9 _1 = 3 . 

<— We replaced 45 by its residue modulo 26 . 

«— We added — 19 times the first row to the second . 

«— We replaced the entries in the second row by their residues modulo 26 
<— We multiplied the second row by 5 -1 = 21 . 

<— We replaced the entries in the second row by their residues modulo 26 
«— We added — 19 times the second row to the first . 

«— We replaced the entries in the first row by their residues modulo 26 . 


Thus, 

1 0 

17 9 

so the deciphering matrix is 


To decipher the message, we first group the ciphertext into pairs and find the numerical equivalent of each 
letter: 

10 SB TO XE SP XH OP DE 

9 15 19 2 20 7 24 5 19 16 24 8 15 16 4 5 

Next, we multiply successive ciphertext vectors on the left by A -1 and find the alphabet equivalents of the 
resulting plaintext pairs: 




(mod 26) 


1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

"1 

0 


17 

9 

17 

9 

17 

9 

17 

9 

17 

9 

17 

9 

17 

9 


17 



'4' 

D 

5_ 

E 

f 

A 

18_ 

R 

9" 

I 

11 _ 

K 

' 5' 

E 

. 19 . 

S 

5' 

E 

14 

N 

' 4' 

D 

_20_ 

T 

f 

A 

_14_ 

N 

'll' 

K 

19 

S 


Finally, we construct the message from the plaintext pairs: 

DE AR IK ES EN DT AN KS 
DEAR IKE SEND TANKS 


Further Readings 

Readers interested in learning more about mathematical cryptography are referred to the following books, the first 
of which is elementary and the second more advanced. 

1. Abraham Sinkov, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Association of America, 2009). 

2. Alan G. Konheim, Cryptography, a Primer (New York: Wiley-Interscience, 1981). 


Exercise Set 10.15 


1. Obtain the Hill cipher of the message 


for each of the following enciphering matrices: 

(a) [1 3 

_2 1 

(b) [4 3' 

1 2 


DARK NIGHT 


Answer: 




























(a) GIYUOKEVBH 

(b) SFANEFZWJH 


2. In each part determine whether the matrix is invertible modulo 26. If so, find its inverse modulo 26 and check your work 
by verifying that AA = A = I (mod 26). 


(a) A 
V A 
(0 A 

(d) A 
(0 A 
® A 


9 1 

7 2 . 

3 r 

5 3_ 

8 11 

1 9 

2 r 

1 7_ 

3 f 

6 2 _ 
1 8 " 
1 3 


Answer: 


(a) ^-1 


12 T 
23 15 


(b) 

(c) 


Not invertible 


i!" 1 


1 19' 
23 24 _ 


(d) 

(e) 

(f) 


Not invertible 


Not invertible 




15 12 
21 5 


3. Decode the message 


SAKNOXAOJX 


given that it is a Hill cipher with enciphering matrix 


4 1 
3 2 


Answer: 


WE LOVE MATH 

4. A Hill 2-cipher is intercepted that starts with the pairs 

SLHK 

Find the deciphering and enciphering matrices, given that the plaintext is known to start with the word ARMY. 


Answer: 

Deciphering matrix = 


7 15 
6 5 


; enciphering matrix = 


7 5 
2 15 


5. Decode the following Hill 2-cipher if the last four plaintext letters are known to be ATOM. 



LNG1HGYBVRBNJYQO 


Answer: 

THEY SPLIT THE ATOM 

6 . Decode the following Hill 3-cipher if the first nine plaintext letters are IHAVECOME : 

HPAFQGGDUGDDHPGODYNOR 

Answer: 

I HAVE COME TO BURY CAESAR 

7. All of the results of this section can be generalized to the case where the plaintext is a binary message; that is, it is a 

sequence of 0's and l's. In this case we do all of our modular arithmetic using modulus 2 rather than modulus 26. Thus, 
for example, 1 1 = 0 (mod 2). Suppose we want to encrypt the message 110101111. Let us first break it into triplets to 

'll m m ri 1 o~ 

form the three vectors 1 , 0 , 1 , and let us take 0 1 1 as our enciphering matrix. 

oj [lj [lj [l 1 1 

(a) Find the encoded message. 

(b) Find the inverse modulo 2 of the enciphering matrix, and verify that it decodes your encoded message. 

Answer: 


(a) 010110001 

(b) ro i r 

i i i 
1 0 1 

8 . If, in addition to the standard alphabet, a period, comma, and question mark were allowed, then 29 plaintext and 
ciphertext symbols would be available and all matrix arithmetic would be done modulo 29. Under what conditions 
would a matrix with entries in Z 29 be invertible modulo 29? 

Answer: 


A is invertible modulo 29 if and only if det(-d) =£ 0 (mod 29). 

9. Show that the modular equation Ax = 1 (mod 26) has no solution in Z 2 $ by successively substituting the values 
x = 0, 1,2,..., 25. 

10 T 

* ( a ) Let P and Cbe the matrices in Theorem 10.15.5. Show that P = C(A~^) • 

(b) To prove Theorem 10.15.5, let E\, E 2 , E n be the elementary matrices that correspond to the row operations that 
reduce C to /, so 

E n ..E 2 E { C = I 

Show that 


E yi ..E 2 E l P=(A ~ l ) 

_ 1 'T 

from which it follows that the same sequence of row operations that reduces C to I converts P to (A ) . 


(a) If A is the enciphering matrix of a Hill ^/-cipher, show that 


A~ l = (C~ 1 P) (mod 26) 


where C and P are the matrices defined in Theorem 10.15.5. 



(b) Instead of using Theorem 10.15.5 as in the text, find the deciphering matrix of Example 8 by using the result in 
part (a) and Equation 2 to compute C • [Note: Although this method is practical for Hill 2-ciphers, Theorem 
10.15.5 is more efficient for Hill ^/-ciphers with ^ > 2-] 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with 
some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular 
utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. 
Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of 
the problems in the regular exercise sets. 


Tl. Two integers that have no common factors (except 1) are said to be relatively prime. Given a positive integer n , let 
S n = {a\,a 2 ,ct 2 ,, where a \ < ct2 < <23 < - - - < <3 m , be the set of all positive integers less than n and relatively 
prime to n. For example, if ^ = 9 , then 

£9 = [ a \, ^2, <33, a ^} = (1,2, 4, 5, 7, 8} 

(a) Construct a table consisting of n and S n for n = 2, 3,15, and then compute 

m f m \ 

Y and (mod«) 

fc=l Vc=l j 


in each case. Draw a conjecture for ^ > 15 and prove your conjecture to be true. [Hint: Use the fact that if a is 
relatively prime to n , then « — a is also relatively prime to n.] 


(b) Given a positive integer n and the set let P n be the yn x m matrix 


so that, for example, 




2 

<* 3 - 

- a m— 1 

a m 

a2 

a 3 

a 4 .. 

- <*m 


a 3 

a 4 

a 5 .. 

a 1 

a 2 

a m— 1 

a m 

a\ .. 

- a m —3 

a m— 2 

a m 


a 2 - 

- a m— 2 

a m— 1 



1 

2 

4 

5 

7 

8’ 


2 

4 

5 

7 

8 

1 

p 9 = 

4 

5 

7 

8 

1 

2 

5 

7 

8 

1 

2 

4 


7 

8 

1 

2 

4 

5 


8 

1 

2 

4 

5 

7 


Use a computer to compute det(.P M ) and det(P„) (mod n) for n = 2, 3,15, and then use these results to construct a 
conjecture. 


(c) Use the results of part (a) to prove your conjecture to be true. [Hint: Add the first m—\ rows of P n to its last row and 
then use Theorem 2.2.3.] What do these results imply about the inverse of P„(mod «)? 


T2. Given a positive integer n greater than 1, the number of positive integers less than n and relatively prime to n is called 
the Euler phi function of n and is denoted by p(n ). For example, ^(6) = 2 since only two positive integers (1 and 5) are 
less than 6 and have no common factor with 6. 

(a) Using a computer, for each value of n = 2, 3,25 compute and print out all positive integers that are less than n and 
relatively prime to n. Then use these integers to determine the values of for n = 2, 3,25. Can you discover a 
pattern in the results? 






(b) It can be shown that if {p\ 9 P2> P2> •••* Pm) are all the distinct prime factors of n, then 

For example, since (2, 3} are the distinct prime factors of 12, we have 

which agrees with the fact that {1, 5, 7, 11} are the only positive integers less than 12 and relatively prime to 12. 
Using a computer, print out all the prime factors of n for n = 2, 3,25. Then compute \p{n) using the formula above 
and compare it to your results in part (a). 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.16 Genetics 

In this section we investigate the propagation of an inherited trait in successive generations by computing 
powers of a matrix. 


Prerequisites 

Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 


Inheritance Traits 

In this section we examine the inheritance of traits in animals or plants. The inherited trait under consideration 
is assumed to be governed by a set of two genes, which we designate by A and a. Under autosomal 
inheritance each individual in the population of either gender possesses two of these genes, the possible 
pairings being designated AA, Aa, and aa. This pair of genes is called the individual's genotype, and it 
determines how the trait controlled by the genes is manifested in the individual. For example, in snapdragons 
a set of two genes determines the color of the flower. Genotype A A produces red flowers, genotype Aa 
produces pink flowers, and genotype aa produces white flowers. In humans, eye coloration is controlled 
through autosomal inheritance. Genotypes A A and aa have brown eyes, and genotype Aa has blue eyes. In this 
case we say that gene A dominates gene a, or that gene a is recessive to gene A. because genotype Aa has the 
same outward trait as genotype AA. 

In addition to autosomal inheritance we will also discuss X-linked inheritance. In this type of inheritance, the 
male of the species possesses only one of the two possible genes (A or a), and the female possesses a pair of 
the two genes (AA, aa, or Aa). In humans, color blindness, hereditary baldness, hemophilia, and muscular 
dystrophy, to name a few, are traits controlled by X-linked inheritance. 

Below we explain the manner in which the genes of the parents are passed on to their offspring for the two 
types of inheritance. We construct matrix models that give the probable genotypes of the offspring in terms of 
the genotypes of the parents, and we use these matrix models to follow the genotype distribution of a 
population through successive generations. 


Autosomal Inheritance 

In autosomal inheritance an individual inherits one gene from each of its parents' pairs of genes to form its 
own particular pair. As far as we know, it is a matter of chance which of the two genes a parent passes on to 
the offspring. Thus, if one parent is of genotype Aa , it is equally likely that the offspring will inherit the A 


gene or the a gene from that parent. If one parent is of genotype aa and the other parent is of genotype Aa , the 
offspring will always receive an a gene from the aa parent and will receive either an A gene or an a gene, with 
equal probability, from the Aa parent. Consequently, each of the offspring has equal probability of being 
genotype aa or Aa. In Table 1 we list the probabilities of the possible genotypes of the offspring for all 
possible combinations of the genotypes of the parents. 

Table 1 


Genotype 
of Offspring 


Ge 

notypes of Parents 



AA-AA 

AA-Aa 

AA-aa 

Aa-Aa 

Aa-aa 

aa-aa 

AA 

i 

l 

0 

1 

4 

0 

o 

Aa 

0 

l 

3 

i 

3 

\ 

0 

aa 

0 

0 

0 

l 

4 

l 

i 


EXAMPLE 1 Distribution of Genotypes in a Population 

Suppose that a farmer has a large population of plants consisting of some distribution of all 
three possible genotypes AA , Aa , and aa. The farmer desires to undertake a breeding program in 
which each plant in the population is always fertilized with a plant of genotype AA and is then 
replaced by one of its offspring. We want to derive an expression for the distribution of the 
three possible genotypes in the population after any number of generations. 

For n = 0, 1, 2,let us set 

a n = fraction of plants of genotype AA in n th generation 
b n = fraction of plants of genotype Aa in n th generation 

c n = fraction of plants of genotype aa in n th generation 

Thus flQ? &q, and c*o specify the initial distribution of the genotypes. We also have that 

a n A-b n +c„ = 1 for n = 0, 1, 2,... 

From Table 1 we can determine the genotype distribution of each generation from the genotype 
distribution of the preceding generation by the following equations: 

«« = a n-\ + \ b n-\ 

b n = c n —1 + 2 ’^m—1 n = \, 2 ,... ^ 

c n = 0 

For example, the first of these three equations states that all the offspring of a plant of genotype 
AA will be of genotype AA under this breeding program and that half of the offspring of a plant 
of genotype Aa will be of genotype AA. 













Equations 1 can be written in matrix notation as 


x ( ”> = Mx ( ” _1) , n = 1,2,... (2) 

where 

a n a n —i 2 

x<”) = , X 1 '” - ^ = b n -\ , and M= ^ ^ 

c M _l 2 

[0 0 0 

Note that the three columns of the matrix M are the same as the first three columns of Table 1. 
From Equation 2 it follows that 

x (”) = Mx (”- 1 ) = M 2 x^-^ = • • • = M”x® (3) 


Consequently, if we can find an explicit expression for M”, we can use 3 to obtain an explicit 
expression for X ( M ). To find an explicit expression for M”, we first diagonalize M. That is, we 
find an invertible matrix P and a diagonal matrix D such that 

M = PDP~ l ( 4 ) 


With such a diagonalization, we then have (see Exercise 1 ) 

M n = PD”P~ { for n = 1 . 2 ,... 

where 

'A! 0 0 ... 0l” k 0 0 ... 0 

D n = o a 2 0 ... 0 _ 0 Aj 0 ... 0 

: : : : : : : : 

0 0 0 ... X k ooo ... 

The diagonalization of Mis accomplished by finding its eigenvalues and corresponding 
eigenvectors. These are as follows (verify): 

Eigenvalues: Aj = 1, A2 = A3 = 0 

'll [1] 1' 

Corresponding eigenvectors: vj = 0 , V2 = — 1 , V3 = —2 

oj [ °J 1 

Thus, in Equation 4 we have 

'Ai 0 0 1 10 0 

D= 0 A 2 0 = 0-^0 

A 2 

. 0 0 A3 J 0 0 0 


and 



p= [V 1 IV 2 IV 3 ] = 



Using the fact that ciQ + &q -f cq = l,we thus have 



These are explicit formulas for the fractions of the three genotypes in the nth generation of 
plants in terms of the initial genotype fractions. 


Because 


n 

tends to zero as n approaches infinity, it follows from these equations that 


ct n \ 

b n - 0 


as n approaches infinity. That is, in the limit all plants in the population will be genotype AA. 


EXAMPLE 2 Modifying Example 1 




















We can modify Example 1 so that instead of each plant being fertilized with one of genotype 
AA , each plant is fertilized with a plant of its own genotype. Using the same notation as in 
Example 1, we then find 

X (») =M " X (P) 

where 

1 5 0 

M= 0^0 

0 5 1 

The columns of this new matrix Mare the same as the columns of Table 1 corresponding to 
parents with genotypes AA-AA, Aa-Aa , and aa-aa. 


The eigenvalues of M are (verify) 


Ai = 1, A 2 = 1, A 3 = 


The eigenvalue Aj = 1 has multiplicity two and its corresponding eigenspace is 
two-dimensional. Picking two linearly independent eigenvectors vi and V2 in that eigenspace, 
and a single eigenvector V 3 for the simple eigenvalue A 3 = we have (verify) 

'11 [oi r r 

Vi = 0 , V2 = 0 , V 3 = -2 

°J M L 1 

The calculations for X C M ) are then 

xV> = M yi x ( ^ ) = PD n P~ { x i ® 


'1 

0 

1 ] 

0 

0 

-2 

0 

1 

lj 


1 0 0 
0 1 0 


0 i 1 *0 

1 c ° 

0 4 0 


1 i-4 


a 0 

0 A 0 


0 i-fi 


Thus, 



*0 


ct n 


aQ + 


1 

2 



b 


n 


c n 



(6) 


In the limit, as n tends to infinity. 




0 and ('.L'l 


h+1 


0, so 


a n —► + ^0 

b n ► 0 
c n -» c 0 + ^o 

Thus, fertilization of each plant with one of its own genotype produces a population that in the 
limit contains only genotypes AA and aa. 


Autosomal Recessive Diseases 

There are many genetic diseases governed by autosomal inheritance in which a normal gene A dominates an 
abnormal gene a. Genotype AA is a normal individual; genotype A a is a carrier of the disease but is not 
afflicted with the disease; and genotype aa is afflicted with the disease. In humans such genetic diseases are 
often associated with a particular racial group—for instance, cystic fibrosis (predominant among Caucasians), 
sickle-cell anemia (predominant among people of African origin), Cooley's anemia (predominant among 
people of Mediterranean origin), and Tay-Sachs disease (predominant among Eastern European Jews). 

Suppose that an animal breeder has a population of animals that carries an autosomal recessive disease. 
Suppose further that those animals afflicted with the disease do not survive to maturity. One possible way to 
control such a disease is for the breeder to always mate a female, regardless of her genotype, with a normal 
male. In this way, all future offspring will either have a normal father and a normal mother (AA-AA matings) 
or a normal father and a carrier mother ( AA-Aa matings). There can be no AA-aa matings since animals of 
genotype aa do not survive to maturity. Under this type of mating program no future offspring will be 
afflicted with the disease, although there will still be carriers in future generations. Let us now determine the 
fraction of carriers in future generations. We set 

j&> = 

where 

a n = fraction of population of genotype AA in n th generation 
b n = fraction of population of genotype Aa (earners) in n th generation 
Because each offspring has at least one normal parent, we may consider the controlled mating program as one 


a n 

by\ 


, n = 1, 2, 








of continual mating with genotype Aa , as in Example 1. Thus, the transition of genotype distributions from 
one generation to the next is governed by the equation 

x&> = Mx C”" 1 ), ft =1,2 ,... 

where 


M = 


1 i 
0 1 


Because we know the initial distribution the distribution of genotypes in the nth generation is thus given 

by 

xV> = M”x<®, n=\,2 ,... 

The diagonalization of M is easily carried out (see Exercise 4) and leads to 
x (”) = PD”P~ l x ( V ) = 



1 0 



'1 f 

/ - 

'i f 

a o 

_° 

-1 

o 

——» 

_° -i_ 

^0 





f\\ n 

1 ’-(?) 



«0 4-^0 “ J ^0 

M 



n \ n 

- 1 

o 

_i 



- 1 

O 


Because ag + &g = 1, we have 


a n — 1 


b n — 


-&r* 


Thus, as n tends to infinity, we have 


a y 


n = 1, 2,... 


1 

0 


(7) 


so in the limit there will be no carriers in the population. 
From 7 we see that 

1 


b n — 1' n — r 2,... 


( 8 ) 


That is, the fraction of carriers in each generation is one-half the fraction of carriers in the preceding 
generation. It would be of interest also to investigate the propagation of carriers under random mating, when 
two animals mate without regard to their genotypes. Unfortunately, such random mating leads to nonlinear 
equations, and the techniques of this section are not applicable. However, by other techniques it can be shown 
that under random mating, Equation 8 is replaced by 


b n — 


b n-\ 


1 + 2^h-i 


n = 1,2,. 


(9) 



As a numerical example, suppose that the breeder starts with a population in which 10% of the animals are 
carriers. Under the controlled-mating program governed by Equation 8, the percentage of carriers can be 
reduced to 5% in one generation. But under random mating, Equation 9 predicts that 9.5% of the population 
will be carriers after one generation (b n = .095 if b n -\ = . 10). In addition, under controlled mating no 
offspring will ever be afflicted with the disease, but with random mating it can be shown that about 1 in 400 
offspring will be bom with the disease when 10% of the population are carriers. 


X-Linked Inheritance 

As mentioned in the introduction, in X-linked inheritance the male possesses one gene (A or a) and the female 
possesses two genes ( AA , Aa, or aa). The term X-linked is used because such genes are found on the 
X-chromosome, of which the male has one and the female has two. The inheritance of such genes is as 
follows: A male offspring receives one of his mother's two genes with equal probability, and a female 
offspring receives the one gene of her father and one of her mother's two genes with equal probability. 

Readers familiar with basic probability can verify that this type of inheritance leads to the genotype 
probabilities in Table 2. 

Table 2 





Genotypes of Parents (Father, Mother) 




(T..A4) 

H.Ta) 


<a,A4) 

(«, Aa) 

(a, aa) 


— 

A 

1 

1 

0 

1 

\ 

o 

DC 


a 

0 

1 

3 

1 

0 

1 

3 
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— 

k 

- 

£ 


AA 

1 

\ 

> 

0 

0 

0 

0 

w 

Female 

Aa 

0 

l 

1 

1 

1_ 

0 



aa 

0 

0 

0 

0 

1 

1 


We will discuss a program of inbreeding in connection with X-linked inheritance. We begin with a male and 
female; select two of their offspring at random, one of each gender, and mate them; select two of the resulting 
offspring and mate them; and so forth. Such inbreeding is commonly performed with animals. (Among 
humans, such brother-sister marriages were used by the rulers of ancient Egypt to keep the royal line pure.) 

The original male-female pair can be one of the six types, corresponding to the six columns of Table 2: 

(A, AA), (A, Aa), ( A,aa ), (a, AA), ( a, Aa), ( a,aa ) 

The sibling pairs mated in each successive generation have certain probabilities of being one of these six 
types. To compute these probabilities, for n = 0, 1,2,..., let us set 




















probability sibling-pair mated in n th generation is type {A, AA) 
probability sibling-pair mated in n th generation is type (A, Aa) 
probability sibling-pair mated in n th generation is type (A, aa) 
probability sibling-pair mated in n th generation is type (a, AA) 
probability sibling-pair mated in n th generation is type (a, Aa) 
probability sibling-pair mated in n th generation is type (a, aa) 


With these probabilities we form a column vector 


x("> = 


* = 0 , 1 , 2 , 


From Table 2 it follows that 


where 


= n — 1, 2,... 


(A, AA) (A, Aa) (A, aa) (a, AA) (a, Aa) (a, aa) 

1 4 - 0 0 0 0 

4 (A, AA) 

0 4 ° 1 4 ° (AAa) 

M= 0 0 0 0 4 0 (A,aa) 

0 4 0 0 0 0 (fl.AA) 

4 

0 1 1 o 1 o (<*’■&*) 

o o o o I i (a - aa) 

4 

For example, suppose that in the (« — 1)-st generation, the sibling pair mated is type (A, Aa). Then their 
male offspring will be genotype A or a with equal probability, and their female offspring will be genotype AA 
or Aa with equal probability. Because one of the male offspring and one of the female offspring are chosen at 
random for mating, the next sibling pair will be one of type (. A, AA), (A, Aa), (a, AA), or (a, Aa) with 
equal probability. Thus, the second column of M contains “i” in each of the four rows corresponding to these 
four sibling pairs. (See Exercise 9 for the remaining columns.) 


As in our previous examples, it follows from 10 that 


= «= 1 , 2 , 


(11) 



After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be 

>1 = 1. >2=1. >3=2’ >4 = — 2’ >J = jO + /5). >6 = j(l-/5) 



The diagonalization of M then leads to 

x ( & = PD n p- X y® ) , n = 1,2,... (12) 


where 



p 


D n 


P~ l 


1 0 
0 0 
0 0 

0 0 
0 0 
0 1 


-1 1 i(-3-/5) i(-3 + /S> 

2-6 1 1 

-1 -3 i(-l+/5) ±(-1-/5) 

1 3 ±(-l + /5) ±(-1-/5) 

-2 6 1 1 

1 -1 ±(-3-/» ±(-3 + /s) 


1 0 0 
0 1 0 



0 0 0 
0 0 0 
0 0 0 


0 0 
0 0 

0 0 


0 

0 

0 



0 


0 

[*o-r»r 


1 

0 

0 

0 

0 


2 

3 

1 

3 

1 

8 

24 

X 

20 



1 

3 

2 

3 

_I 

4 

12 


2 

3 

1 

3 

1 

4 

J_ 

12 


1 

3 

2 

3 

1 

8 

J_ 

24 


0 

1 

0 

0 


20 


(5 + /?) 0 


o ±0-f5) -\f5 -±/5 £(5-/5) o 


We will not write out the matrix product in 12, as it is rather unwieldy. However, if a specific vector x ' IJ1 is 
given, the calculation for x 00 is not too cumbersome (see Exercise 6 ). 


Because the absolute values of the last four diagonal entries of D are less than 1, we see that as n tends to 
infinity, 


D n 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 



And so, from Equation 12 


x 


(«) 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 




Performing the matrix multiplication on the right, we obtain (verify) 

2 , , 1 - , 2 . , 1 


rC"). 


<^0 + + ~ 2 C 0 + + ~ J & 0 

0 

0 

0 

0 

f 0 + 0 + + ^ e o 


(13) 


That is, in the limit all sibling pairs will be either type (A, AA ) or type (a, act). For example, if the initial 
parents are type (A, Act) (that is, = 1 and aa = cq = = eo = /o = 0), then as n tends to infinity, 


x 


(») 


2 

3 

0 

0 

0 

0 

1 

3 


2 1 
Thus, in the limit there is probability that the sibling pairs will be (A, AA ), and probability -y that they will 

be (a, aa). 


Exercise Set 10.16 

1. Show that if _ ppp~ 1, then — PD n P~^ for n = 1, 2, 

2. In Example 1 suppose that the plants are always fertilized with a plant of genotype Aa rather than one of 
genotype AA. Derive formulas for the fractions of the plants of genotypes AA, Aa, and aa in the nth 
generation. Also, find the limiting genotype distribution as n tends to infinity. 


Answer: 








1 / 1\” +1 

a n = J + (2 J (^0 — c o) 




^ 


n=\, 2 ,...b„ = -!r 


-nr' 


(« 0 -co) 


\ as n 


oo 


c »^4 


3. In Example 1 suppose that the initial plants are fertilized with genotype AA, the first generation is 
fertilized with genotype Aa, the second generation is fertilized with genotype AA, and this alternating 
pattern of fertilization is kept up. Find formulas for the fractions of the plants of genotypes AA, Aa, and aa 
in the «th generation. 


Answer: 


} n = 0,1, 2,... 


2 1 

32m+1=o + C/A ,n (2fl0-*0-4g 0 ) 

5 6(4) 

*2w+l = (2^0 “*0 “4c7q) 

3 6(4) 

c 2w+l = 0 

= -p>- + -^yr(2ao - - 4co) 

* 2 h = ^ ) « = 1 , 2 ,... 

C2 ” = 12 “ $( 4 ) m (2a °~ b °~ 4co) 


4. In the section on autosomal recessive diseases, find the eigenvalues and eigenvectors of the matrix M and 
verify Equation 7. 

Answer: 


Eigenvalues: Aj = 1, A 2 = ~k', eigenvectors: ei = 


> e 2 = 


1 

-1 


5. Suppose that a breeder has an animal population in which 25% of the population are carriers of an 
autosomal recessive disease. If the breeder allows the animals to mate irrespective of their genotype, use 
Equation 9 to calculate the number of generations required for the percentage of carriers to fall from 25% 
to 10%. If the breeder instead implements the controlled-mating program determined by Equation 8, what 
will the percentage of carriers be after the same number of generations? 


Answer: 


12 generations; .006% 

6. In the section on X-linked inheritance, suppose that the initial parents are equally likely to be of any of the 
six possible genotype parents; that is. 



I 

6 

I 

6 

1 

6 

1 

6 

I 

6 

1 

6 

Using Equation 12, calculate and also calculate the limit of X ( M ) as n tends to infinity. 

Answer: 




7. From 13 show that under X-linked inheritance with inbreeding, the probability that the limiting sibling 
pairs will be of type (A, AA) is the same as the proportion of A genes in the initial population. 

8. In X-linked inheritance suppose that none of the females of genotype Aa survive to maturity. Under 
inbreeding the possible sibling pairs are then 

(A, AA), ( A,aa ), ( a, AA), and (a,aa) 

Find the transition matrix that describes how the genotype distribution changes in one generation. 



Answer: 



10 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 1 

9. Derive the matrix Min Equation 10 from Table 2. 

Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


Tl. 


(a) Use a computer to verify that the eigenvalues and eigenvectors of 


1 

0 


0 


M = 


0 


0 


0 


1 

4 

1 

4 


0 0 
0 1 


0 0 0 
0 0 
1 0 
0 0 0 


0 

1 

4 

1 

4 

0 

1 

4 

1 

4 


0 

0 

0 

0 

0 

1 


as given in the text are correct. 

(b) Starting with and the assumption that 


lim x ( ”> = x 

Yl —»OQ 


exists, we must have 

lim x ( ”> = M lim x ( ” -1) or x = Mx 

Yl— K3G Yl—* QQ 


This suggests that v can be solved directly using the equation (M ^ /)x = 0.Usea computer to solve the 
equation x = Mx, where 






X = 


a 

b 
c 

d 
e 

f 

and a I b + c + d +e +• / = 1; compare your results to Equation 13. Explain why the solution to 
(M — 7)x = 0 along with a+b + c + d + e + f = 1 is not specific enough to determine lim x^. 

YI—+00 


T2. 


(a) Given 


1 

0 

-1 

1 


- 3 -/ 5 ) 

0 

0 

2 

-6 


1 

0 

0 

-1 

-3 

?<■ 

- 1 + ij 5 ) 

0 

0 

1 

3 

}<■ 

- 1 + \[ 5 ) 

0 

0 

-2 

6 


1 

0 

1 

1 

-1 


- 3 -/ 5 ) 


^(-3 + /5) 

1 

1 

^(-3 + / 5 ) 


from Equation 12 and 


lim D n 

n—*oQ 


1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


use a computer to show that 


lim M n 

Yl —>00 


1 

2 

1 

2 

1 

0 

3 

3 

3 

3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

2 

1 

2 

1 

3 

3 

3 

3 


(b) Use a computer to calculate M n for n = 10, 20, 30, 40, 50, 60, 70, and then compare your results to the 
limit in part (a). 
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10.17 Age-Specific Population Growth 

In this section we investigate, using the Leslie matrix model, the growth over time of a female population that 
is divided into age classes. We then determine the limiting age distribution and growth rate of the population. 


Prerequisites 

Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 


One of the most com m on models of population growth used by demographers is the so-called Leslie model 
developed in the 1940s. This model describes the growth of the female portion of a human or animal 
population. In this model the females are divided into age classes of equal duration. To be specific, suppose 
that the maximum age attained by any female in the population is L years (or some other time unit) and we 
divide the population into n age classes. Then each class is Lin years in duration. We label the age classes 
according to Table 1. 

Table 1 


Age Class 

Age Interval 

i 

(0. L/n) 


lL/n , 2L/ n) 

3 

[2 Lf it, 3 L/ n) 

n-l 

1(m - 2)Lf n, (/»- 1 )L/ n) 

n 

[{n-l)L/n. L] 


Suppose that we know the number of females in each of the n classes at time £ = Q. In particular, let there be 
jr® females in the first class, females in the second class, and so forth. With these n numbers we form a 

column vector: 




*1 

( 0 ) 

*2 




We call this vector the initial age distribution vector. 







As time progresses, the number of females within each of the n classes changes because of three biological 
processes: birth, death, and aging. By describing these three processes quantitatively, we will see how to 
project the initial age distribution vector into the future. 


The easiest way to study the aging process is to observe the population at discrete times—say, 

Iq, l 1 > ^2> - • -» £k> - - The Leslie model requires that the duration between any two successive observation 
times be the same as the duration of the age intervals. Therefore, we set 

^0 = 0 
t\ — Lin 
t2 — 2 Lin 

tfc = kL i n 

With this assumption, all females in the (j + 1)-st class at time were in the zth class at time t^. 


The birth and death processes between two successive observation times can be described by means of the 
following demographic parameters: 


a , 

(»= 1.2 . n) 

The av erage number of daughters 
bom to each female during the 
time she is in the zth age class 

bi 

(/= 1.2 . n- 1) 

The fraction of females in the ith 
age class that can be expected to 
surv iv e and pass into the (/ +1 )-st 
age class 


By their definitions, we have that 

(i) (3i >0 for i = 1, 2 ,n 

(ii) 0 <i,<l fori=l,2 . n — 1 

Note that we do not allow any b l to equal zero, because then no females would survive beyond the zth age 
class. We also assume that at least one fly is positive so that some births occur. Any age class for which the 
corresponding value of taq is positive is called a fertile age class. 


We next define the age distribution vector x (*0 at time by 


x« = 


(*) 

*1 

(*) 

*2 



where is the number of females in the zth age class at time t^. Now, at time the females in the first age 
class are just those daughters bom between times t^-i and t^. Thus, we can write 








(number of'l 
females 
in class 1 
at time 


' number of ' 


' number of ' 


f number of ' 

daughters 


daughters 


daughters 

bom to 


bom to 


bom to 

females in 

) + < 

females in 

\ +-+ { 

females in } 

class 1 


class 2 


class n 

between times 


between times 


between times 

tk-1 and^ 

\ / 


*ft-l and tk 

/ 


*ft-l and tk 

/ 


or, mathematically, 


(ft) _ (ft-1) (ft-1) (*-1) 

JTj —<21*1 + < 22*2 •^... + a yi x n 


( 1 ) 


The females in the (i + 1)-st age class (i = 1, 2,— 1) at time are those females in the zth class at 
time i who are still alive at time Thus, 

fraction of 
females in 
class i 
who survive 
and pass into 

class i + 1 




dumber of 'l 
females in 


class i 4 - 1 
at time 1% 


= 




{ number of 'l 
females in 
class i 
at time 


or, mathematically, 


(ft) (ft-1) 

X H-1 = , j = 1,2. n- 1 


Using matrix notation, we can write Equations 1 and 2 as 


or more compactly as 


'(ft)' 

*1 


~*\ a 2 3 — an -1 a n~\ 

■ (ft-i)' 

*1 

to 

& 


O 

O 

O 

O 

(ft-i) 

x 2 

(ft) 

= 

0 b 2 0 ... 0 0 

(ft-1) 

x 3 


: : : : : 

x 3 

(ft) 


O 

7 

0 

0 

0 

_ 1 

1 - 

* 

W 

1 _ 


x (A) = Zx (ft-i) ; k=h2 ,... 


( 2 ) 


(3) 


where L is the Leslie matrix 


L = 


a i a 2 a 3 
b\ 0 0 

0 b 2 0 


a n —1 a n 
0 0 
0 0 


0 0 0 ... b n -1 0 


(4) 



From Equation 3 it follows that 


x® = £x® 

= ix^ = Z 2 x© 

x© = Zx® = Z 3 x® 

x<® = Lx^ = L k x^ 

Thus, if we know the initial age distribution x ' IJ1 and the Leslie matrix L, we can determine the female 
distribution at any later time. 

EXAMPLE 1 Female Age Distribution for Animals 


Suppose that the oldest age attained by the females in a certain animal population is 15 years 
and we divide the population into three age classes with equal durations of five years. Let the 
Leslie matrix for this population be 


L = 


0 

1 

2 

0 


4 3 
0 0 


If there are initially 1000 females in each of the three age classes, then from Equation 3 we 
have 


x 


(P) 


1,000 

1,000 

1,000 




'o 

4 

3' 






1 

2 

0 

0 

'1,000' 


'7,000' 

X® 

= Ix® = 

1,000 

= 

500 



0 

1 

4 

0 

1,000 


250 



'o 

4 

3 






1 

2 

0 

0 

'7,000' 


'2,750' 

x® 

= = 

500 

= 

3, 500 



0 

1 

4 

0 

250 


125 



'o 

4 

3' 






1 

2 

0 

0 

'2,750' 


14,375 

x® 

= Zx® = 

3, 500 

= 

1, 375 



0 

1 

0 

125 


875 




4 






Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females 
between 5 and 10 years of age, and 875 females between 10 and 15 years of age. 
























Limiting Behavior 


Although Equation 5 gives the age distribution of the population at any time, it does not immediately give a 
general picture of the dynamics of the growth process. For this we need to investigate the eigenvalues and 
eigenvectors of the Leslie matrix. The eigenvalues of L are the roots of its characteristic polynomial. As we 
ask you to verify in Exercise 2, this characteristic polynomial is 

P( A) = \XI-L\ 

= A” - aiA” _1 - a 2 b\\ n ~ 2 - fl 3 i>i6 2 A” _3 -... - a n b\b 2 ..b n -\ 

To analyze the roots of this polynomial, it will be convenient to introduce the function 


,m-«l I ^1 | . | a nbib 2 . . i 

A A 2 A 3 A" 


( 6 ) 


Using this function, the characteristic equation p (A) = 0 can be written (verify) 


?(A) = 1 for A * 0 


(7) 


Because all the cJq and bj are nonnegative, we see that q (A) is monotonically decreasing for A greater than 
zero. Furthermore, q (A) has a vertical asymptote at A = 0 and approaches zero as A —► oo- Consequently, as 
Figure 10.17.1 indicates, there is a unique A, say A = Aj, such that ^(Aj) = 1. That is, the matrix L has a 
unique positive eigenvalue. It can also be shown (see Exercise 3) that A] has multiplicity 1; that is, \\ is not a 
repeated root of the characteristic equation. Although we omit the computational details, you can verify that 
an eigenvector corresponding to Aj is 

1 

b\ f Ai 

b x b 2 i\l 

xi = o (8' 

61&263/AJ 

Because Ai has multiplicity 1, its corresponding eigenspace has dimension 1 (Exercise 3), and so any 
eigenvector corresponding to it is some multiple of xj. We can summarize these results in the following 
theorem. 










Figure 10.17.1 


Existence of a Positive Eigenvalue 

A Leslie matrix L has a unique positive eigenvalue Aj. This eigenvalue has multiplicity 1 and an 
eigenvector xi all of whose entries are positive. 


We will now show that the long-term behavior of the age distribution of the population is determined by the 
positive eigenvalue Aj and its eigenvector xj. In Exercise 9 we ask you to prove the following result. 


Eigenvalues of a Leslie Matrix 

If Aj is the unique positive eigenvalue of a Leslie matrix L, and A^ is any other real or complex 
eigenvalue of L, then |A^ | < Aj. 


For our purposes the conclusion in Theorem 10.17.2 is not strong enough; we need Aj to satisfy |A^| < Aj. In 
this case Ai would be called the dominant eigenvalue of L. However, as the following example shows, not all 
Leslie matrices satisfy this condition. 

EXAMPLE 2 Leslie Matrix with No Dominant Eigenvalue 


Let 



0 6 
0 0 


Then the characteristic polynomial of L is 

P( A) = 


XI-L 


= A — 1 


The eigenvalues of L are thus the solutions of A 3 = 1—namely, 

A=l, 


2 2 ’ 


1 

2 



All three eigenvalues have absolute value 1, so the unique positive eigenvalue Aj = 1 is not 
dominant. Note that this matrix has the property that £ = /. This means that for any choice of the 
initial age distribution X (Q), we have 

X © =X C3) = X (6) = ... = X (3A) = 

The age distribution vector thus oscillates with a period of three time units. Such oscillations (or 






population waves , as they are called) could not occur if Aj were dominant, as we will see below. 


It is beyond the scope of this book to discuss necessary and sufficient conditions for X\ to be a dominant 
eigenvalue. However, we will state the following sufficient condition without proof. 


Dominant Eigenvalue 

If two successive entries and +1 in the first row of a Leslie matrix L are nonzero, then the 
positive eigenvalue of L is dominant. 


Thus, if the female population has two successive fertile age classes, then its Leslie matrix has a dominant 
eigenvalue. This is always the case for realistic populations if the duration of the age classes is sufficiently 
small. Note that in Example 2 there is only one fertile age class (the third), so the condition of Theorem 
10.17.3 is not satisfied. In what follows, we always assume that the condition of Theorem 10.17.3 is satisfied. 


Let us assume that L is diagonalizable. This is not really necessary for the conclusions we will draw, but it 
does simplify the arguments. In this case, L has n eigenvalues, Ai, A 2 ,Anot necessarily distinct, and n 
linearly independent eigenvectors, x\, X 2 ,x M , corresponding to them. In this listing we place the dominant 
eigenvalue Ai first. We construct a matrix P whose columns are the eigenvectors of L\ 

P= [x 1 |x 2 |x 3 |...|x„] 


The diagonalization of L is then given by the equation 


L = P 


Ai 0 0 

0 A 2 0 


0 

0 


0 0 0 ... A„ 


>-l 


From this it follows that 


L k = P 


Af 0 0 

0 aJ 0 


0 

0 


A* 


>-l 


for k= 1, 2,. 


0 0 0 

For any initial age distribution vector we then have 


L k x ( ® = P 


'1 


0 0 


0 Aj 0 


0 

0 


0 0 0 ... Aj 


P-'x® 


for k = 1, 2,.... Dividing both sides of this equation by \* and using the fact that we have 








-U<*W 


1 


0 (ft) 


0 0 ... 

k 

0 ... 


0 0 0 ... 


0 

0 

t)' 


P-1 X V) 


(9) 


Because Aj is the dominant eigenvalue, we have |A, / Aj | < 1 for i = 2, 3,.... n. It follows that 

(A, / Ai)* —» 0 as A: — »oo fori = 2, 3. n 

Using this fact, we can take the limit of both sides of 9 to obtain 

10 0 ... O' 


lim 


-U(k) - 


= P 


A? 


0 0 0 


0 


P-1 X (P) 


( 10 ) 


0 0 0 ... 0 

Let us denote the first entry of the column vector p — l x db by the constant c. As we ask you to show in 
Exercise 4, the right side of 10 can be written as cxj, where c is a positive constant that depends only on the 
initial age distribution vector X | U L Thus, 10 becomes 


lim 


-U(k) - 


A? 


X w V =CXJ 


( 11 ) 


Equation 11 gives us the approximation 


^)*cAf 


xi 


( 12 ) 


for large values of k. From 12 we also have 

x^-cAf^xi (13) 

Comparing Equations 12 and 13, we see that 

(14) 

for large values of k. This means that for large values of time, each age distribution vector is a scalar multiple 
of the preceding age distribution vector, the scalar being the positive eigenvalue of the Leslie matrix. 
Consequently, the proportion of females in each of the age classes becomes constant. As we will see in the 
following example, these limiting proportions can be determined from the eigenvector xj. 

EXAMPLE 3 Example 1 Revisited 


The Leslie matrix in Example 1 was 






0 4 3 



xi = *1 /A 1 

b\b 2 i\] 


1 


1 

1 

2 . 

3 

2 


1 

1 

3 

j_ 

18 



From 14 we have 



for large values of k. Hence, every five years the number of females in each of the three classes 
will increase by about 50%, as will the total number of females in the population. 

From 12 we have 



18 

Consequently, eventually the females will be distributed among the three age classes in the ratios 



females in the second age class, and 4% of the females in the third age class. 


EXAMPLE 4 Female Age Distribution for Humans 

In this example we use birth and death parameters from the year 1965 for Canadian females. 
Because few women over 50 years of age bear children, we restrict ourselves to the portion of the 
female population between 0 and 50 years of age. The data are for 5-year age classes, so there are 
total of 10 age classes. Rather than writing out the 10 x 10 Leslie matrix in full, we list the birth 
and death parameters as follows: 














Age Interval 



[0,5) 

0.00000 

0.99651 

[5. 10) 

0.00024 

0.99820 

(10.15) 

0.05861 

0.99802 

(15,20) 

0.28608 

0.99729 

[20, 25) 

0.44791 

0.99694 

(25.30) 

0.36399 

0.99621 

[30,35) 

0.22259 

0.99460 

135.40) 

0.10457 

0.99184 

(40,45) 

0.02826 

0.98700 

(45. 50) 

0.00240 

— 


Using numerical techniques, we can approximate the positive eigenvalue and corresponding 
eigenvector by 


Ai = 1.07622 


and 


1.00000 

0.92594 

0.85881 

0.79641 

0.73800 

0.68364 

0.63281 

0.58482 

0.53897 

0.49429 


Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually every 5 
years their numbers would increase by 7.622%. From the eigenvector xj, we see that, in the limit, 
for every 100,000 females between 0 and 5 years of age, there will be 92,594 females between 5 
and 10 years of age, 85,881 females between 10 and 15 years of age, and so forth. 


Let us look again at Equation 12, which gives the age distribution vector of the population for large times: 

(15) 

Three cases arise according to the value of the positive eigenvalue Aj: 

(i) The population is eventually increasing if Aj > 1 . 

(ii) The population is eventually decreasing if Aj < 1 . 

(iii) The population eventually stabilizes if Aj = 1 . 

The case Aj = 1 is particularly interesting because it determines a population that has zero population 
growth. For any initial age distribution, the population approaches a limiting age distribution that is some 
multiple of the eigenvector xi. From Equations 6 and 7, we see that Aj = 1 is an eigenvalue if and only if 








a i +<*2&i +< 23 ^ 1^2 + -.. + <2^l^2 = 1 


(16) 


The expression 


R = ai +«2^1 +«3&i^2 + -.. + <3«^l^2 ■■ bn -1 (17) 

is called the reproduction rate of the population. (See Exercise 5 for a demographic interpretation of R.) 
Thus, we can say that a population has zero population growth if and only if its net reproduction rate is 1. 


Exercise Set 10.17 


1. Suppose that a certain animal population is divided into two age classes and has a Leslie matrix 



(a) Calculate the positive eigenvalue Aj of L and the corresponding eigenvector xp 

(b) Beginning with the initial age distribution vector 



100 

0 


calculate x d, and x^, rounding off to the nearest integer when necessary. 


(c) Calculate x ' f: ' 1 using the exact formula x ® = Lx'-'’ 1 and using the approximation formula ~ Ajx^ 


Answer: 


(a) 


*1 = 2- x i = 


(b) x (l) = 

100' 

, x® = 

175' 

, x® = 

"250" 

, x^ = 

"382" 

. *® = 

"570" 


50 


50 


88 


125 


191 


(C) x V) = Lx (.S) = 


857 

285 


, x ( 6 ) -Aix (5) = 


855 

287 


2. Find the characteristic polynomial of a general Leslie matrix given by Equation 4. 

3* (a) Show that the positive eigenvalue Aj of a Leslie matrix is always simple. Recall that a root Aq of a 
polynomial #(A) is simple if and only if q ' (Aq) * 0. 

(b) Show that the eigenspace corresponding to Aj has dimension 1. 

4. Show that the right side of Equation 10 is cx\, where c is the first entry of the column vector P _1 x ®- 

5. Show that the net reproduction rate R, defined by 17, can be interpreted as the average number of 
daughters born to a single female during her expected lifetime. 






















6. Show that a population is eventually decreasing if and only if its net reproduction rate is less than 1. 
Similarly, show that a population is eventually increasing if and only if its net reproduction rate is greater 
than 1. 

7. Calculate the net reproduction rate of the animal population in Example 1. 

Answer: 

2.375 

8. (For readers with a hand calculator) Calculate the net reproduction rate of the Canadian female 
population in Example 4. 

Answer: 

1.49611 

9. (For readers who have read Section 10.1-Section 10.3) Prove Theorem 10.17.2. [Hint: Write = re } &, 
substitute into 7, take the real parts of both sides, and show that r < Aj. 

Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematical Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


Tl. Consider the sequence of Leslie matrices 

0 a 
Ll ~ bi 0 


l 4 = 


l 3 = 


0 0 a 

bi 0 0 

0 b 2 0 







’0 

0 

0 

0 

a 

'0 

0 

0 

a 


^1 

0 

0 

0 

0 

b\ 

0 

0 

0 


h 

0 

0 

0 

0 

h 

0 

0 

II 

0 


0 

0 

h 

0 

0 

0 

0 

h 

0 









0 

0 

0 

b 4 

0 


(a) Use a computer to show that 

L\ = h. L] = h, £4 = / 4 , L\ = h,... 

for a suitable choice of a in terms of b\, b 3 ,.... b n -\. 

(b) From your results in part (a), conjecture a relationship between a and b\, b 3 ,.... b n -1 
L* = /„, where 


that will make 










L 


n — 


000 ... 0 
bi 0 0 ... 0 

0 b 2 0 ... 0 

0 0 63 ... 0 

0 0 0 ... b n -1 


a 

0 

0 

0 

0 


(c) Deter m i n e an expression for p n ( A) = |A/ M — L n | and use it to show that all eigenvalues of L n satisfy 
|A| = 1 when a and b\, 62 , •••. b n -\ are related by the equation determined in part (b). 


T2. Consider the sequence of Leslie matrices 



a ap ap 2 
Z 3= b 0 0 ’ 

0 b 0 


a ap ap 2 

L 4 = b 0 0 

0 6 0 

0 0 6 



2 3 4 

a ap ap ap ap 

b 0 0 0 0 

0 6 0 0 O'- 

0 0 6 0 0 

0 0 0 6 0 

a ap ap 2 ... ap n ~ 2 ap n ~ 1 

6 0 0 ... 0 0 

L»= 0 6 0 ... 0 0 

0 0 6 ... 0 0 

0 0 0 ... 6 0 

where 0 <_p<l, 0 < 6 <l, and \ < a . 

(a) Choose a value for n (say, ^ = 8 ). For various values of a, b, and p, use a computer to determine the 
dominant eigenvalue of L n , and then compare your results to the value of a + bp. 

(b) Show that 

w r v* (\ n -{bp)”\ 

p n {\) = XI n — L n = A -a \— 

which means that the eigenvalues of L n must satisfy 

A” +1 - ( a + bp)\ n + a(bp)” = 0 


(c) Can you now provide a rough proof to explain the fact that q wad- bp? 

T3. Suppose that a population of mice has a Leslie matrix L over a 1-month period and an initial age 



distribution vector x ' u 1 given by 


0 

4 

5 

0 


L = 


0 


0 


0 


_9 

10 


0 

0 

0 


4 

5 


0 -k I 7TZ 0 


_9 

10 


0 ? 
0 0 


1 

10 


0 0 0 0 0 


0 0 0 0 


-rr 0 0 0 


0 0 
0 


3_ 

10 


and 



50 

40 

30 

20 

10 

5 


(a) Compute the net reproduction rate of the population. 

(b) Compute the age distribution vector after 100 months and 101 months, and show that the vector after 101 
weeks is approximately a scalar multiple of the vector after 100 months. 

(c) Compute the dominant eigenvalue of L and its corresponding eigenvector. How are they related to your 
results in part (b)? 

(d) Suppose you wish to control the mouse population by feeding it a substance that decreases its age-specific 
birthrates (the entries in the first row of L) by a constant fraction. What range of fractions would cause the 
population eventually to decrease? 
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10.18 Harvesting of Animal Populations 

In this section we employ the Leslie matrix model of population growth to model the sustainable harvesting 
of an animal population. We also examine the effect of harvesting different fractions of different age groups. 


Prerequisites 

Age-Specific Population Growth (Section 10.17) 


Harvesting 

In Section 10.17 we used the Leslie matrix model to examine the growth of a female population that was 
divided into discrete age classes. In this section, we investigate the effects of harvesting an animal population 
growing according to such a model. By harvesting we mean the removal of animals from the population. 
(The word harvesting is not necessarily a euphemism for “slaughtering”; the animals may be removed from 
the population for other purposes.) 

In this section we restrict ourselves to sustainable harvesting policies. By this we mean the following: 


DEFINITION 1 

A harvesting policy in which an animal population is periodically harvested is said to be sustainable 
if the yield of each harvest is the same and the age distribution of the population remaining after each 
harvest is the same. 


Thus, the animal population is not depleted by a sustainable harvesting policy; only the excess growth is 
removed. 

As in Section 10.17, we will discuss only the females of the population. If the number of males in each age 
class is equal to the number of females—a reasonable assumption for many populations—then our harvesting 
policies will also apply to the male portion of the population. 


The Harvesting Model 

Figure 10.18.1 illustrates the basic idea of the model. We begin with a population having a particular age 
distribution. It undergoes a growth period that will be described by the Leslie matrix. At the end of the growth 
period, a certain fraction of each age class is harvested in such a way that the unharvested population has the 


same age distribution as the original population. This cycle repeats after each harvest so that the yield is 
sustainable. The duration of the harvest is assumed to be short in comparison with the growth period so that 
any growth or change in the population during the harvest period can be neglected. 


Population before growth period 


Growth 




c> 


Population after growth period 


Not harvested 


Population 

harvested 


A 


Harvested 

V — 


Figure 10.18.1 

To describe this harvesting model mathematically, let 

"*r 

x 2 

x= . 

*n 

be the age distribution vector of the population at the beginning of the growth period. Thus is the number 
of females in the ith class left unharvested. As in Section 10.17, we require that the duration of each age class 
be identical with the duration of the growth period. For example, if the population is harvested once a year, 
then the population is divided into 1-year age classes. 


If L is the Leslie matrix describing the growth of the population, then the vector £,x is the age distribution 
vector of the population at the end of the growth period, immediately before the periodic harvest. Let kj, for 
i = 1, 2,be the fraction of females from the /th class that is harvested. We use these n numbers to form 
an n x n diagonal matrix 


*1 


H = 


0 

0 


0 0 ... 0 

0 ... 0 

0 h'l ... 0 


0 0 0 ... h n 


which we will call the harvesting matrix. By definition, we have 

0<A,<1 (i = 1, 2,..., n) 

That is, we can harvest none (, h = 0), all (, h 2 = 1), or some fraction (0 < h\ < 1) of each of the n classes. 
Because the number of females in the z'th class immediately before each harvest is the /th entry (Lx) 2 - of the 
vector the /th entry of the column vector 

























HLx = 


h\(Lx) l 
h 2 (Lx ) 2 

h n (Lx)» 

is the number of females harvested from the zth class. 

From the definition of a sustainable harvesting policy, we have 


age distnbution 


age distribution 

at end of 

— [harvest] = 

at beginning of 

growth period 


growth period 


or, mathematically, 


Lx — HLx = x 


( 1 ) 


If we write Equation 1 in the form 


(I — H')Lx = x 


( 2 ) 


we see that x must be an eigenvector of the matrix (/ — H)L corresponding to the eigen- value 1. As we will 
now show, this places certain restrictions on the values of h, and x. 


Suppose that the Leslie matrix of the population is 

ct\ a 2 a 3 
b\ 0 0 

0 i 2 0 

0 0 0 


L = 


... < 3 ^ — 1 Ct n 

... 0 0 

... 0 0 


b n —\ 0 

L 

Then the matrix (I — H)L is (verify) 

(l-&l)ai (\-h\)a 2 (1 — ... (1-Ai)a„_i (1-Ai)a„ 

(1 - h 2 )b\ 0 0 ... 0 0 

(J-H)L = | o (1 -h 3 )b 2 0 ... 0 0 


0 


0 


0 


... (1 — h„)bn—\ 


0 


(3) 


Thus, we see that (/ — H)L is a matrix with the same mathematical form as a Leslie matrix. In Section 10.17 
we showed that a necessary and sufficient condition for a Leslie matrix to have 1 as an eigenvalue is that its 
net reproduction rate also be 1 [see Eq. 16 of Section 10.17], Calculating the net reproduction rate of 
(I — H)L and setting it equal to 1, we obtain (verify) 

(1 -h\) [ai +i32&i(l -h 2 ) +03^1^2(1 -^2)0 ~^3) + — 

+ a n b x b 2 ..Jb n -!(1 -h 2 )( 1 -A 3 )...(l -h n )\ = 1 (4 


This equation places a restriction on the allowable harvesting fractions. Only those values of h\, h 2 . h n 



that satisfy 4 and that lie in the interval [0, 1 ] can produce a sustainable yield. 

If h i, h2, ■ ■ h n do satisfy 4, then the matrix (/ — H)L has the desired eigenvalue Aj = 1. Furthermore, this 
eigenvalue has multiplicity 1, because the positive eigenvalue of a Leslie matrix always has multiplicity 1 
(Theorem 10.17.1). This means that there is only one linearly independent eigenvector x satisfying Equation 
2. [See Exercise 3(b) of Section 10.17.] One possible choice for x is the following normalized eigenvector: 

1 

*ld-*2) 

*l*2d-*2)0-A 3 ) 

X1 * 1 *2*3(l-*2)(l-*3)(l-*4) 

-* 2)(1 -* 3)—(1 ~ h n) 

Any other solution x of 2 is a multiple of xj. Thus, the vector xj determines the proportion of females within 
each of the n classes after a harvest under a sustainable harvesting policy. But there is an ambiguity in the 
total number of females in the population after each harvest. This can be determined by some auxiliary 
condition, such as an ecological or economic constraint. For example, for a population economically 
supported by the harvester, the largest population the harvester can afford to raise between harvests would 
determine the particular constant that xj is multiplied by to produce the appropriate vector x in Equation 2. 
For a wild population, the natural habitat of the population would determine how large the total population 
could be between harvests. 

Summarizing our results so far, we see that there is a wide choice in the values of h\, hj ,..., h n that will 
produce a sustainable yield. But once these values are selected, the proportional age distribution of the 
population after each harvest is uniquely determined by the normalized eigenvector xj defined by Equation 5. 
We now consider a few particular harvesting strategies of this type. 


Uniform Harvesting 


With many populations it is difficult to distinguish or catch animals of specific ages. If animals are caught at 
random, we can reasonably assume that the same fraction of each age class is harvested. We therefore set 

h=h\=k 2 =... = h n 


Equation 2 then reduces to (verify) 



Hence, 1 / (1 — h) must be the unique positive eigenvalue X\ of the Leslie growth matrix L. That is, 

Al = 

Solving for the harvesting fraction h , we obtain 

* = ! — (! / Ai) (6) 


The vector ? in this case, is the same as the eigenvector of L corresponding to the eigenvalue X \. From 






Equation 8 of Section 10.17, this is 


1 

*l/Ai 

hb 2 i\\ 

Xl ~ b\b 2 b 2 / Aj ( 

From 6 we can see that the larger X\ is, the larger is the fraction of animals we can harvest without depleting 
the population. Note that we need Ai > 1 in order for the harvesting fraction h to lie in the interval (0, 1). 
This is to be expected, because Ai > 1 is the condition that the population be increasing. 

EXAMPLE 1 Harvesting Sheep 


For a certain species of domestic sheep in New Zealand with a growth period of 1 year, the 
following Leslie matrix was found (see G. Caughley, “Parameters for Seasonally Breeding 


Populations,” Ecology , 48 , 1967, pp. 

834-839). 








.000 

.045 

.391 

.472 

.484 

.546 

.543 

.502 

.468 

.459 

.433 

.421 



.845 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



0 

.975 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



0 

0 

.965 

0 

0 

0 

0 

0 

0 

0 

0 

0 



0 

0 

0 

.950 

0 

0 

0 

0 

0 

0 

0 

0 


j _ 

0 

0 

0 

0 

.926 

0 

0 

0 

0 

0 

0 

0 


Lt — 

0 

0 

0 

0 

0 

.895 

0 

0 

0 

0 

0 

0 



0 

0 

0 

0 

0 

0 

.850 

0 

0 

0 

0 

0 



0 

0 

0 

0 

0 

0 

0 

.786 

0 

0 

0 

0 



0 

0 

0 

0 

0 

0 

0 

0 

.691 

0 

0 

0 



0 

0 

0 

0 

0 

0 

0 

0 

0 

.561 

0 

0 



0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

.370 

0 


The sheep have a lifespan of 12 years, so they are divided into 12 age classes of duration 1 year 


each. By the use of numerical techniques, the unique positive eigenvalue of L can be found to 
be 

Ai = 1.176 

From Equation 6, the harvesting fraction h is 

h = 1 - (1 / Ai) = 1 - (1 / 1.176) = .150 

Thus, the uniform harvesting policy is one in which 15.0 % of the sheep from each of the 12 
age classes is harvested every year. From 7 the age distribution vector of the sheep after each 
harvest is proportional to 






1.000 

0.719 

0.596 

0.489 

0.395 

0.311 

0.237 

0.171 

0.114 

0.067 

0.032 

0.010 


(8) 


From 8 we see that for every 1000 sheep between 0 and 1 year of age that are not harvested, 
there are 719 sheep between 1 and 2 years of age, 596 sheep between 2 and 3 years of age, and 
so forth. 


Harvesting Only the Youngest Age Class 

In some populations only the youngest females are of any economic value, so the harvester seeks to harvest 
only the females from the youngest age class. Accordingly, let us set 

h\ = h 

= ^3 =... = h n — 0 

Equation 4 then reduces to 

(1 -h)(a i +<32^1 +<23^1^2 +-~ + anb\h---bn-l) = 1 

or 

( 1 -/*)/?= 1 

where R is the net reproduction rate of the population. [See Equation 17 of Section 10.17.] Solving for h, we 
obtain 


A=l-(1/*) (9) 

Note from this equation that a sustainable harvesting policy is possible only if R > ]. This is reasonable 
because only if R > 1 is the population increasing. From Equation 5, the age distribution vector after each 
harvest is proportional to the vector 




Y1 = 


( 10 ) 


1 

h 

*>ih 

* 1*2*3- -K-\ 


EXAMPLE 2 Sustainable Harvesting Policy 


Let us apply this type of sustainable harvesting policy to the sheep population in Example 1. 
For the net reproduction rate of the population we find 

R = a\ -E <3 2 *i -E ( 23 * 1*2 4“----E ^m* 1 *2*--* 1 

= (.000) + (.045) (.845) + ...+ (421)(.845)(.975)...(.370) 

= 2.514 

From Equation 9, the fraction of the first age class harvested is 

h = 1 - (1 /R) = 1 - (1 / 2.514) = .602 

From Equation 10, the age distribution of the sheep population after the harvest is proportional 
to the vector 


xj = 


1.000 

.845 

(.845) (.975) 
(.845) (.975) (.965) 


(.845) (.975)...( 370) 


1.000 

0.845 

0.824 

0.795 

0.755 

0.699 

0.626 

0.532 

0.418 

0.289 

0.162 

0.060 


( 11 ) 


A direct calculation gives us the following (see also Exercise 3): 








Lxi = 


2.514 

0.845 

0.824 

0.795 

0.755 

0.699 

0.626 

0.532 

0.418 

0.289 

0.162 

0.060 


( 12 ) 


The vector Z,xj is the age distribution vector immediately before the harvest. The total of all 
entries in Lx\ is 8.520, so the first entry 2.514 is 29.5% of the total. This means that 
immediately before each harvest, 29.5% of the population is in the youngest age class. Since 
60.2% of this class is harvested, it follows that 17.8% (= 60.2% of 29.5%) of the entire sheep 
population is harvested each year. This can be compared with the uniform harvesting policy of 
Example 1, in which 15.0% of the sheep population is harvested each year. 


Optimal Sustainable Yield 

We saw in Example 1 that a sustainable harvesting policy in which the same fraction of each age class is 
harvested produces a yield of 15.0 % of the sheep population. In Example 2 we saw that if only the youngest 
age class is harvested, the resulting yield is 17.8 % of the population. There are many other possible 
sustainable harvesting policies, and each generally provides a different yield. It would be of interest to find a 
sustainable harvesting policy that produces the largest possible yield. Such a policy is called an optimal 
sustainable harvesting policy, and the resulting yield is called the optimal sustainable yield. However, 
determining the optimal sustainable yield requires linear programming theory, which we will not discuss here. 
We refer you to the following result, which appears in J. R. Beddington and D. B. Taylor, “Optimum Age 
Specific Harvesting of a Population,” Biometrics, 29, 1973, pp. 801-809. 


Optimal Sustainable Yield 

An optimal sustainable harvesting policy is one in which either one or two age classes are harvested. 
If two age classes are harvested, then the older age class is completely harvested. 


As an illustration, it can be shown that the optimal sustainable yield of the sheep population is attained when 




Al= 0.522 

kg = 1.000 


(13) 


and all other values of kj are zero. Thus, 52.2 % of the sheep between 0 and 1 year of age and all the sheep 
between 8 and 9 years of age are harvested. As we ask you to show in Exercise 2, the resulting optimal 
sustainable yield is 19.9 % of the population. 


Exercise Set 10.18 


1. Let a certain animal population be divided into three 1-year age classes and have as its Leslie matrix 


L = 


0 

1 

2 

0 


4 3 
0 0 


(a) Lind the yield and the age distribution vector after each harvest if the same fraction of each of the 
three age classes is harvested every year. 

(b) Lind the yield and the age distribution vector after each harvest if only the youngest age class is 
harvested every year. Also, find the fraction of the youngest age class that is harvested. 


Answer: 


(a) 


Yield = 33-=r% of population; xi 


(b) 


Yield = 45.8% of population; xj 


1 

1 

3 

J_ 

18 

f 

1 

2 

1 

8 


; harvest 57.9% of youngest age class 


2. For the optimal sustainable harvesting policy described by Equations 13, find the vector xj that specifies 
the age distribution of the population after each harvest. Also calculate the vector Lx\ and verify that the 
optimal sustainable yield is 19.9 % of the population. 


Answer: 








1.000 


2.090 

.845 


.845 

.824 


.824 

.795 


.795 

.755 


.755 

.699 

, Lx i = 

.699 

.626 

.626 

.532 


.532 

0 


.418 

0 


0 

0 


0 

0 


0 


1.090+ .418 100 

7384 " 199 


3. Use Equation 10 to show that if only the first age class of an animal population is harvested 


Lx\ —xj = 


R- 1 
0 
0 


0 


where R is the net reproduction rate of the population. 


4. If only the z'th class of an animal population is to be periodically harvested (/ = 1, 2,.... n), find the 
corresponding harvesting fraction hj. 


Answer: 

(R- 1) / (ajb\b2 • • • 6/-i + • • • + a„b\b2’ ■ 'b n - 1) 

5. Suppose that all of the Jth class and a certain fraction hj of the 7th class of an animal population is to be 
periodically harvested (1 < I < J < n) . Calculate hj. 


Answer: 


131 +^ 2 ^ 1 + ' ' ' + (aj-ib\b 2 • * ’_bj- 2 ) — ! 

aib\b2' • •&/_!+ • • • + aj-\b\b2 • • • bj -2 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


Tl. The results of Theorem 10.18.1 suggest the following algorithm for determining the optimal sustainable 
yield. 










1. For each value of i = 1, 2,..., n, set kj = h and = 0 lor k ^i and calculate the respective yields. These 
n calculations give the one-age-class results. Of course, any calculation leading to a value of h not between 
0 and 1 is rejected. 


2. For each value of j = 1, 2,— 1 and j = i + 1, i + 2. n, set hj = h, ftj = 1, and h^ = 0 for it 

j and calculate the respective yields. These — 1) calculations give the two-age-class results. Of 
course, any calculation leading to a value of h not between 0 and 1 is again rejected. 


3. Of the yields calculated in parts (i) and (ii), the largest is the optimal sustainable yield. Note that there will 
be at most 

n + \n(n — 1) = \n(n + 1) 


calculations in all. Once again, some of these may lead to a value of h not between 0 and 1 and must 
therefore be rejected. 


If we use this algorithm for the sheep example in the text, there will be at most ^(12)(12 + 1) = 78 


calculations to consider. Use a computer to do the two-age-class calculations for h\ = k, hj = 1, and = 0 
for ^ ^ j or j for j = 2,3,..., 12. Construct a summary table consisting of the values of h \ and the 
percentage yields using j = 2, 3,12, which will show that the largest of these yields occurs when j = 9. 


T2. Using the algorithm in Exercise T1 , do the one-age-class calculations for h 2 - = h and = 0 for ^ j for 
i = 1, 2,12 Construct a summary table consisting of the values of k 2 and the percentage yields using 
i = 1, 2,12, which will show that the largest of these yields occurs when j = 9- 


T3. Referring to the mouse population in Exercise T3 of Section 10.17, suppose that reducing the birthrates 
is not practical, so you instead decide to control the population by uniformly harvesting all of the age classes 
monthly. 

(a) What fraction of the population must be harvested monthly to bring the mouse population to equilibrium 
eventually? 

(b) What is the equilibrium age distribution vector under this uniform harvesting policy? 

(c) The total number of mice in the original mouse population was 155. What would be the total number of 
mice after 5,10, and 200 months under your uniform harvesting policy? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.19 A Least Squares Model for Human Hearing 

In this section we apply the method of least squares approximation to a model for human hearing. The use of this 
method is motivated by energy considerations. 


Prerequisites 

Inner Product Spaces 
Orthogonal Projection 
Fourier Series (Section 6.6) 


Anatomy of the Ear 


We begin with a brief discussion of the nature of sound and human hearing. Figure 10.19.1 is a schematic diagram 
of the ear showing its three main components: the outer ear, middle ear, and inner ear. Sound waves enter the outer 
ear where they are channeled to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically 
link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass on the vibrations of the 
eardrum to a fluid within the cochlea. The cochlea contains thousands of minute hairs that oscillate with the fluid. 
Those near the entrance of the cochlea are stimulated by high frequencies, and those near the tip are stimulated by 
low frequencies. The movements of these hairs activate nerve cells that send signals along various neural pathways 
to the brain, where the signals are interpreted as sound. 


Sound 

wave 


Figure 10.19.1 



Auditory 

nerve 


__ To 

brain 


The sound waves themselves are variations in time of the air pressure. For the auditory system, the most 
elementary type of sound wave is a sinusoidal variation in the air pressure. This type of sound wave stimulates the 
hairs within the cochlea in such a way that a nerve impulse along a single neural pathway is produced (Figure 
10.19.2). A sinusoidal sound wave can be described by a function of time 







q(t) = Ao + A sin(u )t — 8) 


(1) 


where q(t) is the atmospheric pressure at the eardrum, ^4 q is the normal atmospheric pres-sure, A is the maximum 
deviation of the pressure from the normal atmospheric pressure, ^ / 2 tt is the frequency of the wave in cycles per 
second, and $ is the phase angle of the wave. To be perceived as sound, such sinusoidal waves must have 
frequencies within a certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. 
Frequencies outside this range will not stimulate the hairs within the cochlea enough to produce nerve signals. 



Figure 10.19.2 


To a reasonable degree of accuracy, the ear is a linear system. This means that if a complex sound wave is a finite 
sum of sinusoidal components of different amplitudes, frequencies, and phase angles, say, 

q{t) =4] + A\ sin^i^ — S\) A A 2 sin (^ 2 * — ^2) +... + A n sin(u; M * ^8 n ) (2) 

then the response of the ear consists of nerve impulses along the same neural pathways that would be stimulated by 
the individual components (Figure 10.19.3). 



Figure 10.19.3 

Let us now consider some periodic sound wave p(t) with period T [i.e., p(t) pit + T)] that is not a finite sum 
of sinusoidal waves. If we examine the response of the ear to such a periodic wave, we find that it is the same as 
the response to some wave that is the sum of sinusoidal waves. That is, there is some sound wave q(t) as given by 
Equation 2 that produces the same response as pit), even though p(t) and 1 q(t) are different functions of time. 

We now want to determine the frequencies, amplitudes, and phase angles of the sinusoidal components of q(t). 
Because q(t) produces the same response as the periodic wave p(t ), it is reasonable to expect that q{t) has the 
same period T as p{i). This requires that each sinusoidal term in q(t) have period T. Consequently, the frequencies 


































of the sinusoidal components must be integer multiples of the basic frequency 1 / p of the function p(t). Thus, the 
lJ fc in Equation 2 must be of the form 

uik = 2 forfT, k = 1 , 2 ,... 

But because the ear cannot perceive sinusoidal waves with frequencies greater than 20,000 cps, we may omit those 
values of k for which / 2ir = k / T is greater than 20,000. Thus, q(t) is of the form 

q(t)=A 0 + A l + + (3) 

where n is the largest integer such that n! T is n °f greater than 20,000. 

We now turn our attention to the values of the amplitudes Aq, A\,A n and the phase angles 5\, 62 , - that 
appear in Equation 3. There is some criterion by which the auditory system “picks” these values so that q (t) 
produces the same response as p{i). To examine this criterion, let us set 

e(t)=p(t) —q(t) 

If we consider q(t) as an approximation to p{t), then e (t) is the error in this approximation, an error that the ear 
cannot perceive. In terms of e{t), the criterion for the determination of the amplitudes and the phase angles is that 
the quantity 

f [e(t)] 2 dt=[ [p{t)-q{t)] 2 dt (4) 

h J 0 

be as small as possible. We cannot go into the physiological reasons for this, but we note that this expression is 
proportional to the acoustic energy of the error wave e(t) over one period. In other words, it is the energy of the 
difference between the two sound waves p(t) and q(t) that determines whether the ear perceives any difference 
between them. If this energy is as small as possible, then the two waves produce the same sensation of sound. 
Mathematically, the function q(t) in 4 is the least squares approximation to p{i) from the vector space C[ 0, T] of 
continuous functions on the interval [0, T ]. (See Section 6.6.) 


Least squares approximations by continuous functions arise in a wide variety of engineering and scientific 
approximation problems. Apart from the acoustics problem just discussed, some other examples follow. 

Let S(x) be the axial strain distribution in a uniform rod lying along the x-axis from * = 0 to x = / (Figure 
10.19.4). The strain energy in the rod is proportional to the integral 

I [S(x)] 2 dx 

J 0 

The closeness of an approximation q(x) to S(x) can be judged according to the strain energy of the difference 
of the two strain distributions. That energy is proportional to 

/ [S(x) -q(x)] 2 dx 

J 0 

which is a least squares criterion. 

Let E{t) be a periodic voltage across a resistor in an electrical circuit (Figure 10.19.5). The electrical energy 
transferred to the resistor during one period T is proportional to 

[E{t)] 2 dt 

ltq{i) has the same period as E{i) and is to be an approximation to E{t), then the criterion of closeness might 
be taken as the energy of the difference voltage. This is proportional to 




/ 




which is again a least squares criterion. 

Let y(x) be the vertical displacement of a uniform flexible string whose equilibrium position is along the x-axis 
from x = 0 to x = l (Figure 10.19.6). The elastic potential energy of the string is proportional to 

A 


r 

J 0 


[yWVdx 


If ^(x) is to be an approximation to the displacement, then as before, the energy integral 

W 


/ 
J 0 


[y(x)-q( X )Vdx 
determines a least squares criterion for the closeness of the approximation. 
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Figure 10.19.4 
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Figure 10.19.5 



Least squares approximation is also used in situations where there is no a priori justification for its use, such as for 
approximating business cycles, population growth curves, sales curves, and so forth. It is used in these cases 
because of its mathematical simplicity. In general, if no other error criterion is immediately apparent for an 
approximation problem, the least squares criterion is the one most often chosen. 


The following result was obtained in Section 6.6. 












Minimizing Mean Square Error on [0, 2 tt] 


If f (t) is continuous on [0, 2 tt] , then the trigonometric function g(t) of the form 



that minimizes the mean square error 



0 


has coefficients 


has coefficients 



= 0, 1, 2 ,n 


bk = -z f(t)sinktdt,k=\,2,...,n 
% 



= 1 , 2 


If the original function f (t) is defined over the interval [0, T] instead of [0, 2x], a change of scale will yield the 
following result (see Exercise 8): 

Minimizing Mean Square Error on [0, 7] 

If f (t) is continuous on [0, T ], then the trigonometric function g{t) of the form 


g(t) = ±a Q + ai cos^rt + ... + a„ cos-^i + ii sin^sin ^-t 


that minimizes the mean square error 



has coefficients 




EXAMPLE 1 Least Squares Approximation to a Sound Wave 

Let a sound wave p{t) have a saw-tooth pattern with a basic frequency of 5000 cps (Figure 10.19.7). 
Assume units are chosen so that the normal atmospheric pressure is at the zero level and the 
maximum amplitude of the wave is A. The basic period of the wave is7’ = 1 / 5000 = .0002 second. 
From t = 0 to t = the function p(t) has the equation 


2 A(T 






Theorem 10.19.2 then yields the following (verify): 


II 

O 

J" p(0 dl -\ 

(¥({-)*- 


ak H 

rT P mos =*? 

-■1 

in- 

t y os 2kELdt = 0, *=1,2,... 

h = 

rT H,)s m 2 ^ 

d ‘ = h 


y^2M. dt= ^ t k= 


We can now investigate how the sound wave p(t) is perceived by the human ear. We note that 
4 f T= 20,000 cps, so we need only go up to = 4 in the formulas above. The least squares 
approximation to p{i) is then 

_ 2A [ • 2tt, , 1 • 4?r, , 1 • , 1 • 8 tt f 

q{t) — “j” sin—^ I —bin-^r^ I ^sm —t \ 

The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, respectively. In 
Figure 10.19.8 we have plotted p{t) and q(t) over one period. Although q(t) is not a very good 
point-by-point approximation to p(t), to the ear, both p{i) and q (t) produce the same sensation of 
sound. 
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Figure 10.19.7 



As discussed in Section 6.6, the least squares approximation becomes better as the number of terms in the 
approximating trigonometric polynomial becomes larger. More precisely, 












f 2 srr . m 

/ / 00 - 7r<zo - E Oft COS to + 6* sin to) 

JO I 1 *=1 


2 

i it 


tends to zero as n approaches infinity. We denote this by writing 

1 oo 

/ (0 ~ “dCg + 5Z (a* COS to + ^A; sm to) 

^ fc=l 

where the right side of this equation is the Fourier series of f {t). Whether the Fourier series of f (t) converges to 
/ (t) for each t is another question, and a more difficult one. For most continuous functions encountered in 
applications, the Fourier series does indeed converge to its corresponding function for each value of t. 


Exercise Set 10.19 


1. Find the trigonometric polynomial of order 3 that is the least squares approximation to the function 
/ (t) = (t — tt)^ over the interval [0, 2 tt] . 


Answer: 

_2 A 

~~ -F 4 cos t + cos 2 1 + t-cos 3 1 

2. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function / (t) =t 
over the interval [0, T ]. 

Answer: 


^ + ( cos + 1 cos 1r ‘ ■ + ^ cos f-' + ^ cos T-‘ 

T / • 2IT . . 1 • 4?7 - , 1 ■ 677 . , 1 • 877 
—— I sin —t + — sm —t + — sm —t + — sm —t I 


3. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function / ( t ) over 
the interval [0, 2tt] , where 


f{t) = 



0 <t <77 
77 < t 277 


Answer: 


— + f sin t — cos 2 1 —cos At 
K 2 377 1577 

4. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
f (t) = sin-^-i over the interval [0, 2tt] . 


Answer: 


4/J._1 

*\2 1 • 


1 


T C0!< — 3-5 


cos 2 1 — 


1 


cos 31 — • 


1 


cos nt 


5-7 (2»-l)(2*+l) 

5. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
f (t) over the interval [0, T ], where 








Answer: 


4 X 2 


7 87 



6 . For the inner product 


r2n 

(u, vj = J u(t)v(t) dt 


show that 


(a) 11111 = /*= 

(b) ||cos fo|| =/x for k= 1, 2,... 

(c) ||sin&|| = /?r for£=l,2,... 


7. Show that the 2 n | 1 functions 


1, cos t, cos 2 1 ,cos nt, smt, sin 2t,sm nt 


are orthogonal over the interval [0, 2x] relative to the inner product (u, v';, defined in Exercise 6 . 

8 . If / (t) is defined and continuous on the interval [0, T ], show that / (7V / 2?r) is defined and continuous for 7 - 
in the interval [0, 2 tt] . Use this fact to show how Theorem 10.19.2 follows from Theorem 10.19.1. 


Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Let g be the function 



for 0 < t < 2tt. Use a computer to determine the Fourier coefficients 



for k = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for and b ft. Test your 

conjecture by calculating 



on the computer and see whether it converges to g(t ). 
T2. Let g be the function 









g(t) =e cosf [cos(sin£) + sin(sin^)] 
for 0 < t < 2tt. Use a computer to determine the Fourier coefficients 



for k = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for ct £ and by c . Test your 
conjecture by calculating 

1 oo 

+ 5Z (a* cos to + sm to) 

1 ft=1 

on the computer and see whether it converges to g(t). 
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10.20 Warps and Morphs 

Among the more interesting image-manipulation techniques available for computer graphics are warps and 
morphs. In this section we show how linear transformations can be used to distort a single picture to produce 
a warp, or to distort and blend two pictures to produce a morph. 


Prerequisites 

Geometry of Linear Operators on g} (Section 4.11) 
Linear Independence 
Bases in g} 


Computer graphics software enables you to manipulate an image in various ways, such as by scaling, rotating, 
or slanting the image. Distorting an image by separately moving the comers of a rectangle containing the 
image is another basic image-manipulation technique. Distorting various pieces of an image in different ways 
is a more complicated procedure that results in a warp of the picture. In addition, warping two different 
images in complementary ways and blending the warps results in a morph of the two pictures (from the Greek 
root meaning “shape” or “form”). An example is Figure 10.20.1 in which four photographs of a woman taken 
over a 50-year period (the four diagonal pictures from top left to bottom right) have been pairwise morphed 
by different amounts to suggest the gradual aging of the woman. 



Figure 10.20.1 

The most visible application of warping and morphing images has been the production of special effects in 
motion pictures and television. However, many scientific and technological applications of such techniques 
have also arisen—for example, studying the evolution, growth, and development of living organisms, 
assisting in reconstructive and cosmetic surgery, exploring various designs of a product, and “aging” 
photographs of missing persons or police suspects. 


Warps 

We begin by describing a simple warp of a triangular region in the plane. Let the three vertices of a triangle be 
given by the three noncollinear points vq, V 2 , and V 3 (Figure 10.20.2a). We will call this triangle the begin- 
triangle. If v is any point in the begin-triangle, then there are unique constants c\ and c 2 such that 


v-v 3 = ci(vi — v 3 ) +C2(V2-V 3 ) 


( 1 ) 



Equation 1 expresses the vector v — V 3 as a (unique) linear combination of the two linearly independent 
vectors vi — V3 and V2 — V3 with respect to an origin at V3. If we set 03 = 1 — c\ — c 3 , then we can rewrite 1 
as 


v = c 1 vi + C2V2 + C3V3 (2) 

where 

Cl+C 2 +C 3 =l (3) 

from the definition of c 3 . We say that v is a convex combination of the vectors vi ? V 2 , and V 3 if 2 and 3 are 
satisfied and, in addition, the coefficients c\,C 2 , and c 3 are nonnegative. It can be shown (Exercise 6 ) that v 
lies in the triangle determined by vi , V 2 , and V 3 if and only if it is a convex combination of those three 
vectors. 
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Figure 10.20.2 

Next, given three noncollinear points wj, W 2 , and W 3 of an end-triangle (Figure 10.20.2Zi), there is a unique 
affine transformation that maps vj to V 2 to W 2 , and V 3 to W 3 . That is, there is a unique 2x2 invertible 
matrix M and a unique vector b such that 

w 2 = Mvj + b for j = 1, 2, 3 (4) 

(See Exercise 5 for the evaluation of M and b.) Moreover, it can be shown (Exercise 3) that the image w of the 
vector v in 2 under this affine transformation is 






w = cjwj + C2W2 4 - C3W3 


( 5 ) 


This is a basic property of affine transformations: They map a convex combination of vectors to the same 
convex combination of the images of the vectors. 

Now suppose that the begin-triangle contains a picture within it (Figure 10.20.3a). That is, to each point in the 
begin-triangle we assign a gray level, say 0 for white and 100 for black, with any other gray level lying 
between 0 and 100. In particular, let a scalar-valued function pg, called the picture-density of the begin- 
triangle, be defined so that pg(v) is the gray level at the point v in the begin-triangle. We can now define a 
picture in the end-triangle, called a warp of the original picture, with a picture-density p\ by defining the gray 
level at the point w within the end-triangle to be the gray level of the point v in the begin-triangle that maps 
onto w. In equation form, the picture-density p\ is determined by 

pi (w) = po(ci v l + C 2 V 2 + C 3 V 3 ) (6) 

In this way, as c\, C2, and c 3 vary over all nonnegative values that add to one, 5 generates all points w in the 
end-triangle, and 6 generates the gray levels p\ (w) of the warped picture at those points (Figure 10.20.3Zi). 
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Figure 10.20.3 


Equation 6 determines a very simple warp of a picture within a single triangle. More generally, we can break 
up a picture into many triangular regions and warp each triangular region differently. This gives us much 
freedom in designing a warp through our choice of triangular regions and how we change them. To this end, 
suppose we are given a picture contained within some rectangular region of the plane. We choose n points vj, 






V2,.... v n within the rectangle, which we call vertex points, so that they fall on key elements or features of 
the picture we wish to warp (Figure 10.20.4a). Once the vertex points are chosen, we complete a 
triangulation of the rectangular region; that is, we draw line segments between the vertex points in such a 
way that we have the following conditions (Figure 10.20.4Z?): 

The line segments form the sides of a set of triangles. 

The line segments do not intersect. 

Each vertex point is the vertex of at least one triangle. 

The union of the triangles is the rectangle. 

The set of triangles is maximal (i.e., no more vertices can be connected). 

Note that condition 4 requires that each comer of the rectangle containing the picture be a vertex point. 
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Figure 10.20.4 


One can always form a triangulation from any n vertex points, but the triangulation is not necessarily unique. 




For example, Figures 10.20.46 and 10.20.4c are two different triangulations of the set of vertex points in 
Figure 10.20.4a. Since there are various computer algorithms that perform triangulations very quickly, it is 
not necessary to perform the tiresome triangulation task by hand; one need only specify the desired vertex 
points and let a computer generate a triangulation from them. If n is the number of vertex points chosen, it can 
be shown that the number of triangles m of any triangulation of those points is given by 

m = 2n — 2 — k (7) 


where k is the number of vertex points lying on the boundary of the rectangle, including the four situated at 
the corner points. 


The warp is specified by moving the n vertex points V 1, V 2 ,..v n to new locations wq ,^ 2 ,..., w n according 
to the changes we desire in the picture (Figures 10.20.5a and 10.20.56). However, we impose two restrictions 
on the movements of the vertex points: 

The four vertex points at the comers of the rectangle are to remain fixed, and any vertex point on a side of 
the rectangle is to remain fixed or move to another point on the same side of the rectangle. All other vertex 
points are to remain in the interior of the rectangle. 

The triangles determined by the triangulation are not to overlap after their vertices have been moved. 

The first restriction guarantees that the rectangular shape of the begin-picture is preserved. The second 
restriction guarantees that the displaced vertex points still form a triangulation of the rectangle and that the 
new triangulation is similar to the original one. For example, Figure 10.20.5c is not an allowable movement 
of the vertex points shown in Figure 10.20.5a. Although a violation of this condition can be handled 
mathematically without too much additional effort, the resulting warps usually produce unnatural results and 
we will not consider them here. 
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Figure 10.20.5 


Figure 10.20.6 is a warp of a photograph of a woman using a triangulation with 94 vertex points and 179 
triangles. Note that the vertex points in the begin-triangulation are chosen to lie along key features of the 
picture (hairline, eyes, lips, etc.). These vertex points were moved to final positions corresponding to those 
same features in a picture of the woman taken 20 years after the begin-picture. Thus, the warped picture 
represents the woman forced into her older shape but using her younger gray levels. 
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Figure 10.20.6 


Time-Varying Warps 

A time-varying warp is the set of warps generated when the vertex points of the begin-picture are moved 
continually in time from their original positions to specified final positions. This gives us a motion picture 
which the begin-picture is continually warped to a final warp. Let us choose time units so that t = 0 
corresponds to our begin-picture and t = 1 corresponds to our final warp. The simplest way of moving the 
vertex points from time 0 to time 1 is with constant velocity along straight-line paths from their initial 









































positions to their final positions. 


To describe such a motion, let Ui(t) denote the position of the ith vertex point at any time t between 0 and 1. 
Thus uj(0) = Vi (its given position in the begin-picture) and Ui(l) = (its given position in the final warp). 
In between, we determine its position by 


Ui(t) = 0 -t)vi4 tWi (8) 

Note that 8 expresses Ui(t) as a convex combination of and wj for each t in [0, 1], Figure 10.20.7 
illustrates a time-varying triangulation of a plain rectangular region with six vertex points. The lines 
connecting the vertex points at the different times are the space-time paths of these vertex points in this 
space-time diagram. 



Once the positions of the vertex points are computed at time t, a warp is performed between the begin-picture 
and the triangulation at time t determined by the displaced vertex points at that time. Figure 10.20.8 shows a 
time-varying warp at five values of t generated from the warp between t = 0 and t = 1 shown in Figure 
10 . 20 . 6 . 



r = 0.00 f = 0.25 t = 0.50 f = 0.75 f = 1.00 

Figure 10.20.8 






















Morphs 


A time-varying morph can be described as a blending of two time-varying warps of two different pictures 
using two triangulations that match corresponding features in the two pictures. One of the two pictures is 
designated as the begin-picture and the other as the end-picture. First, a time-varying warp from t = Q to 
t = 1 is generated in which the begin-picture is warped into the shape of the end-picture. Then a time-varying 
warp from t = \ to t = 0 is generated in which the end-picture is warped into the shape of the begin-picture. 
Finally, a weighted average of the gray levels of the two warps at each time t is produced to generate the 
morph of the two images at time t. 


Figure 10.20.9 shows two photographs of a woman taken 20 years apart. Below the pictures are two 
corresponding triangulations in which corresponding features of the two photographs are matched. The 
time-varying morph between these two pictures for five values of t between 0 and 1 is shown in Figure 


10 . 20 . 10 . 



Begin-picture 



I ind-picture 




Bcgin-triangulation 


End-triangulation 


Figure 10.20.9 



Figure 10.20.10 


















The procedure for producing such a morph is outlined in the following nine steps (Figure 10.20.11): 

Given a begin-picture with picture-density pQ and an end-picture with picture-density pj, position n 
vertex points vi, V 2 ,v„ in the begin-picture at key features of that picture. 

Position n corresponding vertex points W 2 ,w„ in the end-picture at the corresponding key 
features of that picture. 

Triangulate the begin- and end-pictures in similar ways by drawing lines between corresponding 
vertex points in both pictures. 

For any time t between 0 and 1, find the vertex points (£) , 112 (£),..u„ (t) in the morph picture at 

that time, using the formula 

Uj-(0 = (1 -0 v j i= (9) 

Triangulate the morph picture at time t similar to the begin- and end-picture triangulations. 

For any point u in the morph picture at time t, find the triangle in the triangulation of the morph 
picture in which it lies and the vertices ujV), u j(t), and ugV) of that triangle. (See Exercise 1 to 
determine whether a given point lies in a given triangle.) 

Express u as a convex combination of uj(£), uj(£), and uj^V) by finding the constants cj, cj, and 
c £ such that 

u = cju/(0 + cjuj(t) +Cjeu*r(0 (10) 

and 

ci + cj + C K= 1 ( 11 ) 

Deter m ine the locations of the point u in the begin- and end-pictures using 

v = c/v/ + cj\j + (in the begin-picture) (12) 

and 

w = cjwj + cfwj + (in the end-picture) (13) 

Finally, determine the picture-density pf(u) of the morph-picture at the point u using 

Pf(u) = (1 - Opo(v) + tp\ (w) (14) 

Step 9 is the key step in distinguishing a warp from a morph. Equation 14 takes weighted averages of the gray 
levels of the begin- and end-pictures to produce the gray levels of the morph-picture. The weights depend on 
the fraction of the distances that the vertex points have moved from their beginning positions to their ending 
positions. For example, if the vertex points have moved one-fourth of the way to their destinations (i.e., if 
t = 0.25), then we use one-fourth of the gray levels of the end-picture and three-fourths of the gray levels of 


the begin-picture. Thus, as time progresses, not only does the shape of the begin-picture gradually change into 
the shape of the end-picture (as in a warp) but the gray levels of the begin-picture also gradually change into 
the gray levels of the end-picture. 



Time = 1 

End-picture 

Given density: p,(w) 


Time = t 
Morph-picture 
Computed density: 

Pi ii) = 0 - Oflo* v > + *Pd w ) 


Time = 0 
Begin-picture 
Given density: p 0 (y) 


The procedure described above to generate a morph is cumbersome to perform by hand, but it is the kind of 
dull, repetitive procedure at which computers excel. A successful morph demands good preparation and 
requires more artistic ability than mathematical ability. (The software designer is required to have the 
mathematical ability.) The two photographs to be morphed should be carefully chosen so that they have 
matching features, and the vertex points in the two photographs also should be carefully chosen so that the 
triangles in the two resulting triangulations contain similar features of the two pictures. When the procedure is 
done correctly, each frame of the morph should look just as “real” as the begin- and end-pictures. 

The techniques we have discussed in this section can be generalized in numerous ways to produce much more 
elaborate warps and morphs. For example: 

If the pictures are in color, the three components of the picture colors (red, green, and blue) can be 
morphed separately to produce a color morph. 

Rather than following straight-line paths to their destinations, the vertices of a triangulation can be directed 
separately along more complicated paths to produce a variety of results. 

Rather than travel with constant speeds along their paths, the vertices of a triangulation can be directed to 
have different speeds at different times. For example, in a morph between two faces, the hairline can be 
made to change first, then the nose, and so forth. 

Similarly, the gray-level mixing of the begin-picture and end-picture at different times and different 
vertices can be varied in a more complicated way than that in Equation 14. 

One can morph two surfaces in three-dimensional space (representing two complete heads, for example) 
by triangulating the surfaces and using the techniques in this section. 












One can morph two solids in three-dimensional space (for example, two three-dimensional tomographs of 
a beating human heart at two different times) by dividing the two solids into corresponding tetrahedral 
regions. 

Two film strips can be morphed frame by frame by different amounts between each pair of frames to 
produce a morphed film strip in which, say, an actor walking along a set is gradually morphed into an ape 
walking along the set. 

Instead of using straight lines to triangulate two pictures to be morphed, more complicated curves, such as 
spline curves, can be matched between the two pictures. 

Three or more pictures can be morphed together by generalizing the formulas given in this section. 

These and other generalizations have made warping and morphing two of the most active areas in computer 
graphics. 


Exercise Set 10.20 

1. Determine whether the vector v is a convex combination of the vectors vj, V 2 , and V 3 . Do this by solving 
Equations 1 and 3 for c\, c2, and C3 and ascertaining whether these coefficients are nonnegative. 

W v = 

(b) v = 

( c ) v = 

(d) v = 

Answer: 

(a) Yes; v = jvi + jV 2 + JV 3 

(b) No; v = -jvi + ^-V 2 — ^3 

( c ) Yes; v = yvi + -jV 2 4- OV 3 

(d) Yes; v = yjVi + y^-v 2 + -jjv 3 

2. Verify Equation 7 for the two triangulations given in Figure 10.20.4. 

Answer: 

m = number of triangles =7 ,n = number of vertex points = 7 , £ = number of boundary vertex points 
= 5; Equation 7) is 7 = 2(7) — 2 — 5. 

3. Let an affine transformation be given by a 2 x 2 matrix M and a two-dimensional vector b. Let 

v = civi + C2V2 + C3V3, where ci +C2 + C3 = 1; let w= Afv+ b; and let Wj = Mvj + b for i= 1, 2, 3. 
Show that w= cjwj + C2W2 + C3W3. (This shows that an affine transformation maps a convex 
combination of vectors to the same convex combination of the images of the vectors.) 


vi = 


V2 = 


. v 3 = 


vi = 


v 2 — 


, v 3 = 


vi = 


v 2 = 


v 3 = 


vi = 


v 2 — 


v 3 = 










































Answer: 


w= Mv 4 b = M(c\x\ +C2 v 2 + C3V3) 4 (ci +C 2 + <? 3 )b 
= c\(Mx\ 4 b) +C 2 (Afv 2 + b) +C3(il/v3 4 b) =ciwi 4C2 W 2 4C3W3 

(a) Exhibit a triangulation of the points in Figure 10.20.4 in which the points V 3 , v_j, and v$ form the 
vertices of a single triangle. 

(b) Exhibit a triangulation of the points in Figure 10.20.4 in which the points V 2 , v_j, and vj do not form 
the vertices of a single triangle. 

Answer: 

(a) V ' 

v 3 


V 


5 


V 6 V 7 

(b) v ' 

V 3 

'5 

v 6 v 7 

5. Find the 2x2 matrix M and two-dimensional vector b that define the affine transformation that maps the 
three vectors vj, V2, and V3 to the three vectors Wj, W2, and W3. Do this by setting up a system of six 
linear equations for the four entries of the matrix M and the two entries of the vector b. 


1 


2 


2 


4 


9 


5 

1 

. v 2 = 

3 

. v 3 = 

1 

. w i = 

3 

, w 2 = 

5 

, w 3 = 

3 


(b) 

vi = 

1 

00 Csl 

1 

1_ 

> v 2 = 

1 

O O 

1_ 

, v 3 = 

2 

1 

,W1 = 

-8 

1 

, w 2 = 

0 

1 

, w 3 = 

5 
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(c) 

vi = 

-2 

1 

- v 2 = 

3 

5 

- v 3 = 

1 

0 

,WJ = 

1 l l 

O CSI 

1 

1 1_1 

, W 2 = 

5 

2 

, w 3 = 


3 

3 

(d) 

VI = 

1 1 

O Csl 

'2 

V2 = „ 

_2 

, ’ 

v 3 = 

l l 

csi 

,W1 = 

5' 

2 

_-l_ 

, W2 = 

1 1 

00 

, W3 = 


7 

‘2 

-9 


Answer: 

























































(a) Let a and b be linearly independent vectors in the plane. Show that if c\ and Q are nonnegative 
numbers such that cj + C2 = 1 , then the vector + 02 b lies on the line segment connecting the tips 
of the vectors a and b. 

(b) Let a and b be linearly independent vectors in the plane. Show that if c\ and C2 are nonnegative 
numbers such that c\ + cj < 1 , then the vector c 1 a + 02b lies in the triangle connecting the origin 
and the tips of the vectors a and b. [Hint: First examine the vector c 1 a 4 - 02b multiplied by the scale 
factor 1 / (ci +^ 2 )-] 

(c) Let vj, V2, and V3 be noncollinear points in the plane. Show that if C\, c 2 , and £3 are nonnegative 
numbers such that c\ + C2 + C3 = 1 • then the vector cjvi + C2V2 + C3V3 lies in the triangle 
connecting the tips of the three vectors. [Hint: Let a = vi — V3 and b = V2 — V3, and then use 
Equation 1 and part (b) of this exercise.] 

(a) What can you say about the coefficients c 1 , C72, and c 3 that determine a convex combination 

v = c^vj + C2V2 + C3V3 if v lies on one of the three vertices of the triangle determined by the three 
vectors vj, V2, and V3? 

(b) What can you say about the coefficients c\,c 2 , and c 3 that determine a convex combination 

v = cjvi + C2V2 + C3V3 if v lies on one of the three sides of the triangle determined by the three 
vectors vj, V2, and V3? 

(c) What can you say about the coefficients ^ 1 ,^ 2 , and c 3 that determine a convex combination 

v = ci vi + C2V2 + C3V3 if v lies in the interior of the triangle determined by the three vectors vi,V 2 , 
and V3? 

Answer: 

(a) Two of the coefficients are zero. 

(b) At least one of the coefficients is zero. 

(c) None of the coefficients are zero. 

(a) The centroid of a triangle lies on the line segment connecting any one of the three vertices of the 
triangle with the midpoint of the opposite side. Its location on this line segment is two-thirds of the 
distance from the vertex. If the three vertices are given by the vectors vi, V2, and V3, write the 
centroid as a convex combination of these three vectors. 

(b) Use your result in part (a) to find the vector defining the centroid of the triangle with the three vertices 




Answer: 


(a) i Vl + 1 -V 2 + 4v3 


(b) 


3 1 3 

8/3" 
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Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


Tl. To warp or morph a surface in R~‘ we must be able to triangulate the surface. Let vj = 


vii 

v 12 

v 13 



V 21 


V 31 


"vf 

v 2 = 

v 22 

, and V3 = 

v 32 

be three noncollinear vectors on the surface. Then a vector v = 

v 2 


v 23 


V33 


v 3 


lies in the 


triangle formed by these three vectors if and only if v is a convex combination of the three vectors; that is, 
v = ci v i + C2V2 + C3V3 for some nonnegative coefficients c\, c 2, and c 3 whose sum is 1 . 

(a) Show that in this case, c j, C2, and c 3 are solutions of the following linear system: 

"vii v 2 i v 3 i 
v 12 v 22 V32 
V13 V23 V33 

1 1 1 




"vf 

r*f 


v 2 

^2 

|_ c 3_ 


v 3 

1 




In parts (b)-(d) determine whether the vector v is a convex combination of the vectors vi = 


(b) 


(c) 



3" 


2 " 

v 2 = 

0 

, and V 3 = 

2 


9 


-4 


'9' 



V = I 

9 



4 

9 




2 

7 

-5 




10 

9 

9 




























(d) 1 r 13 ■ 

V=T =7 


T2. To warp or morph a solid object in R 1 ' we first partition the object into disjoint tetrahedrons. Let 
>11] |>2l] |>3l] |>4r 

vi = v 12 , V 2 = v 22 , V 3 = v 32 ,andV 4 = v 42 be four noncoplanar vectors. Then a vector 

V 13 J |_ v 23 J [ V33 J L V43 

>l] 

v = v 2 lies in the solid tetrahedron formed by these four vectors if and only if v is a convex combination of 

_ V3 _ 

the three vectors; that is, v = civj + C2V2 4 - C3V3 + C4V4 for some nonnegative coefficients c 1. - 2 - c 3 , and 
c 4 whose sum is one. 


(a) Show that in this case, c\ , C 2 , C 3 , and c 4 are solutions of the following linear system: 

>11 v 2i v 31 v 4 i] r^i ] [vf 
v 12 v 22 v 32 v 42 c 2 _ v 2 

v 13 v 23 v 33 v 43 c 3 ~ v 3 

1 1 1 1 c 4 1 


In parts (b)-(d) determine whether the vector v is a convex combination of the vectors vi 

” 3 1 [ 7 1 r _i 

v 2 = 4 , V3 = 2 , and V4 = 3 . 

2 J L 3 J L 2 

(b) [5" 

v = 0 

7 

(c) fl" 

v= 1 

2 

(d) rr 

v = 2 
2 
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How to Read Theorems 

Since many of the most important concepts in linear algebra occur as theorem statements, it is important to be 
familiar with the various ways in which theorems can be structured. This appendix will help you to do that. 




Contrapositive Form of a Theorem 

The simplest theorems are of the form 


If His true, then C is true. (1) 

where His a statement, called the hypothesis, and C is a statement, called the conclusion. The theorem is true 
if the conclusion is true whenever the hypothesis is true, and the theorem is false if there is some case where 
the hypothesis is true but the conclusion is false. It is common to denote a theorem of form 1 as 

H^C ( 2 ) 

(read, “//implies C”). As an example, the theorem 

If a and b are both positive numbers, then ab is a positive number. (3) 


is of form 2, where 


H = a and b are both positive numbers 


(4) 


C = ab \s a positive number (5) 

Sometimes it is desirable to phrase theorems in a negative way. For example, the theorem in 3 can be 
rephrased equivalently as 

If ab is not a positive number, then a and b are not both positive numbers. (6) 

If we write H to mean that 4 is false and ^ C t0 mean that 5 is false, then the structure of the theorem in 6 
is 


~ C =£■ ~ H 


( 7 ) 


In general, any theorem of form 2 can be rephrased in form 7, which is called the contrapositive of 2. If a 
theorem is true, then so is its contrapositive, and vice versa. 


Converse of a Theorem 

The converse of a theorem is the statement that results when the hypothesis and conclusion are interchanged. 
Thus, the converse of the theorem t{ > Q is the statement Q : //. Whereas the contrapositive of a true 
theorem must itself be a true theorem, the converse of a true theorem may or may not be true. For example, 
the converse of 3 is the false statement 

If ab is a positive number, then a and b are both positive numbers. 
but the converse of the true theorem 


If a >b, then 2 a >2b . (8) 

is the true theorem 

If 2a > 2b, then a > b . (9) 


Equivalent Statements 

If a theorem H > Q and its converse Q > // are both true, then we say that H and C are equivalent 
statements, which we denote by writing 


HoC (10) 

(read, “H and C are equivalent”). There are various ways of phrasing equivalent statements as a single 
theorem. Here are three ways in which 8 and 9 can be combined into a single theorem. 




Form 1 


If a >b, then 2a>2b, and conversely, if 2 a > 2b, then a >b- 


Form 2 


a>b if and only if 2a>2b- 

J 


n 


Form 3 

The following statements are equivalent. 

(i) «>£ 

2 h 


j 


Theorems Involving Three or More Statements 


Sometimes two true theorems will give you a third true theorem for free. Specifically, if H > C > s a true 
theorem, and C > Z) is a true theorem, then H :• [} must also be a true theorem. For example, the theorems 
If opposite sides of a quadrilateral are parallel, then the quadrilateral is a parallelo gram. 
and 

Opposite sides of a parallelogram have equal lengths. 

imply the third theorem 

If opposite sides of a quadrilateral are parallel, then they have equal lengths. 

Sometimes three theorems yield equivalent statements for free. For example, if 


HoC, CoD, D=*H 


( 11 ) 


then we have the implication loop in Figure A.l from which we can conclude that 


C^H, D=*C, H=*D 


( 12 ) 


Combining this with 11 we obtain 


HoC, CoD, Do H 


(13) 


In summary, if you want to prove the three equivalences in 13, you need only prove the three implications in 
11. 


H 


/\ 

D <= 

Figure A.l 
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Complex Numbers 


Complex numbers arise naturally in the course of solving polynomial equations. For example, the solutions of 
the quadratic equation ax~ I bx I c = 0, which are given by the quadratic formula 

-b± {b^Aac 
X 2 a 

are complex numbers if the expression inside the radical is negative. In this appendix we will review some of 
the basic ideas about complex numbers that are used in this text. 




Complex Numbers 

To deal with the problem that the equation = _ 1 has no real solutions, mathematicians of the eighteenth 
century invented the “imaginary” number 

i = 

which is assumed to have the property 

,2 = (/ZT) 2 =- 1 

but which otherwise has the algebraic properties of a real number. An expression of the form 

a+ bi or a + ib 

in which a and b are real numbers is called a complex number. Sometimes it will be convenient to use a 
single letter, typically z, to denote a complex number, in which case we write 

z = a 4- bi or z = a + ib 

The number a is called the real part of z and is denoted by Re(z), and the number b is called the imaginary 
part ofz and is denoted by Im(z). Thus, 

Re(3 + 2i) = 3, Im(3 + 2i) = 2 

Re(l -5i) = 1, Im(l — 5i) = Im(l + (— 5)i) = -5 

Re (7 j) = Re (0 + 7i) = 0, Im(7i)=7 

Re (4) = 4, Im(4) =Im(4 + 0?) = 0 

Two complex numbers are considered equal if and only if their real parts are equal and their imaginary parts 
are equal; that is, 

a + bi = c + di if and only if a=c and b = d 

A complex number z = bi whose real part is zero is said to be pure imaginary. A complex number z = a 
whose imaginary part is zero is a real number, so the real numbers can be viewed as a subset of the complex 




numbers. 


Complex numbers are added, subtracted, and multiplied in accordance with the standard rules of algebra but 
with j ^ = _ 1 : 


(<2 4- bi) + {c + di) = {a + c) + (b + d)i (1) 

(i a + bi ) — (c + di) = (a —c) + (b — d)i (2) 

(a + bi) (c + di) = (ac — bd) + {ad + bc)i (3) 

The multiplication formula is obtained by expanding the left side and using the fact that j 2 = — \ . Also note 
that if t> = Cb then the multiplication formula simplifies to 

a (c + di) = ac 4- adi (4) 


The set of complex numbers with these operations is commonly denoted by the symbol C and is called the 
complex number system. 


EXAMPLE 1 Multiplying Complex Numbers 

As a practical matter, it is usually more convenient to compute products of complex numbers by 
expansion, rather than substituting in 3. For example, 

(3 - 2j)(4 + S) = 12 + 15j - Si - \0i 2 = (12 + 10) +7i = 22 + li 


The Complex Plane 

A complex number z = a \ bi can be associated with the ordered pair ( a , b ) of real numbers and represented 
geometrically by a point or a vector in the xy-plane (Figure B.l). We call this the complex plane. Points on 
the v-axis have an imaginary part of zero and hence correspond to real numbers, whereas points on the y-axis 
have a real part of zero and correspond to pure imaginary numbers. Accordingly, we call the x-axis the real 
axis and the y-axis the imaginary axis (Figure B.2). 


a + bi 


I 


x 



Figure B.l 
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part of z) 


t J magi nary axis 

Z = a + bi 


I 

I 

I 

I Real axis 


(Real part of z) 

Figure B.2 


Complex numbers can be added, subtracted, or multiplied by real numbers geometrically by performing these 
operations on their associated vectors (Figure B.3, for example). In this sense the complex number system C 
is closely related to £ 2 , the main difference being that complex numbers can be multiplied to produce other 
complex numbers, whereas there is no multiplication operation on r} that produces other vectors in R 1 (the 
dot product produces a scalar, not a vector in r}). 
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Figure B.3 

If z = a | bi is a complex number, then the complex conjugate of z, or more simply, the conjugate of z, is 
denoted by z (read, “z bar”) and is defined by 

z = a — bi (5) 

Numerically, z is obtained from z by reversing the sign of the imaginary part, and geometrically it is obtained 
by reflecting the vector for z about the real axis (Figure B.4). 

















Z = a + bi 


b) 


x 

-► 


f = a - bi 


(a, *>) 


Figure B.4 


EXAMPLE 2 Some Complex Conjugates 


z = 3 + Ai 
z = — 2 — 5 i 
z = i 
z = 1 


z —3 — 4 i 
z— — 2 + 5i 
z= —i 
z = l 


The last computation in this example illustrates the fact that a real number is equal to its complex 
conjugate. More generally, z = z if and only if z is a real number. 


The following computation shows that the product of a complex number z = a \ bi and its conjugate 
z = a — bi is a nonnegative real number: 


zz= (a + bi)(a — bi) = a 1 —abi -¥bai — b^i* = a^ + b^ 


( 6 ) 


You will recognize that 

is the length of the vector corresponding to z (Figure B.5); we call this length the modulus (or absolute value 
of z) and denote it by |z|. Thus, 


Note that if b = 0> then z = a is a real number and 


|z|= \[zz= tja 2 + b 2 

H=/?= 


( 7 ) 


which tells us that the modulus of a real 


number is the same as its absolute value as defined in beginning algebra. 
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Figure B.5 


EXAMPLE 3 Some Modulus Computations 


z = 3 + 4i 
z = — 4 — 5i 
z = i 


H 

H 

H 


^3 2 +4 2 = 5 

l/(-4) 2 + (-5) 2 = /4T 

l/o 2 + 1 2 = 1 


Reciprocals and Division 


If z -t 0? then the reciprocal (or multiplicative inverse ) of z is denoted by ] / ^ (or ^ ) and is defined by the 

property 

(?>=’ 

This equation has a unique solution for \ f z, which we can obtain by multiplying both sides by z and using 
the fact that zz = b [see 7]. This yields 


1 

z 



( 8 ) 


If Z2 *■ 0, then the quotient z\ / Z 2 is defined to be the product of z\ and 1 / Z 2 - This yields the formula 


£L _ 12 _ *1^2 

M 2 M 2 


( 9 ) 


Observe that the expression on the right side of 9 results if the numerator and denominator of zj /z 2 are 
multiplied by Z 2 - As a practical matter, this is often the best way to perform divisions of complex numbers. 











EXAMPLE 4 Division of Complex Numbers 


Let z\ = 3 + 4z and Z 2 = 1 — 2L Express zj /^2 in the form a -)- 


We will multiply the numerator and denominator of z\ /by z^. This yields 

£L _ 2i£2 _ 3 + 4i 1 + 2i 
z 2 z 2 Z 2 1-2 i 1 + 2i 

_ 3 -f 6i -f 4i 4- 8i 2 
1 — 4t 2 
_ -5-f 10; 

5 

— — 1 + 2i 


The following theorems list some useful properties of the modulus and conjugate operations. 


THEOREM B.1 

The following results hold for any complex numbers z, z\, and ?2- 

(a) ^1 Tz2=z\ +Z2 

(b) 2i -z 2 =zi -Z 2 

(c) W2=z\Z2 

(d) z\ iz 2 =z\ Iz 2 

(e) z=z 


THEOREM B.2 

The following results hold for any complex numbers z, z\, and ?2- 

(a) ¥1 = 1*1 

(b) |2i*2| = |2i||2 2 | 

(c) |21 lz 2 \ = |zi|/ |Z2| 

(d) l^l+^2|<|2l|+|22| 










Polar Form of a Complex Number 


If z = a | bi is a nonzero complex number, and if is an angle from the real axis to the vector z, then, as 
suggested in Figure B.6, the real and imaginary parts of z can be expressed as 

a = |z|cos 6 and £> = |z|sin<A (10) 

Thus, the complex number z = a+bi can be expressed as 

z= |z|(cos ^ +i sinri) ( 11 ) 

which is called a polar form of z. The angle (p in this formula is called an argument of z. The argument of z is 
not unique because we can add or subtract any multiple of 2k to it to obtain a different argument of z. 
However, there is only one argument whose radian measure satisfies 

—K <<j><K (12) 


This is called the principal argument of z. 

t \ 

(a, b) 

I 

W I b = |;| sin </> 

_I_„ 

a - | cj cos <b 

Figure B.6 


EXAMPLE 5 Polar Form of a Complex Number 

Express z = 1 — \[?>i in polar form using the principal argument. 

The modulus of z is 

z = /l 2 + ( - /3) 2 = {A = 2 

Thus, it follows from 10 with a = 1 an d b = — \j 3 that 

1 = 2 cos <b and — ^3 — 2sin 

and this implies that 

cos$=-^ and sin$= —^ 

The unique angle 6 that satisfies these equations and whose radian measure satisfies 12 is 
(f> = —k13 (Figure B.7). Thus, a polar form of z is 

z = 2^cos^— + i sin ^ — "3 ) ) = 2^cos-| — i sin-|J 







Figure B.7 


Geometric Interpretation of Multiplication and Division of Complex 
Numbers 


We now show how polar forms of complex numbers provide geometric interpretations of multiplication and 
division. Let 

zi = |z^i|(cos-hi sin<^i) and Z 2 = |z 2 |(cos 62 + J sin$ 2 ) 
be polar forms of the nonzero complex numbers z\ and ?2- Multiplying, we obtain 

z\Z 2 — |zi||z 2 |[(cos <£icos 62 — sin^isin^) + i(sin<$icos $2 4 - cos ^>1 sin 02) ] 

Now applying the trigonometric identities 


cos($i +^2) = cos <£icos 62 ~ sin sin (^2 

sin(^i +$2) = sin cos 62 + cos <£isin $2 


yields 


z\Z2 = |^l||^2|[cos(^i +^ 2 ) +i sin(«^i +$ 2 )] 


(13) 


which is a polar form of the complex number with modulus \z\ ||Z2| and argument - >[ -f <i>2- Thus, we have 
shown that multiplying two complex numbers has the geometric effect of multiplying their moduli and adding 
their arguments (Figure B.8). 



<t>\ + 


Figure B.8 







Similar kinds of computations show that 


§L = |i|[cos»i -&) +. an(*, -«] 


(14) 


which tells us that dividing complex numbers has the geometric effect of dividing their moduli and subtracting 
their arguments (both in the appropriate order). 


EXAMPLE 6 Multiplying and Dividing in Polar Form 

Use polar forms of the complex numbers z\ = 1+ and Z 2 = ^3 + i to compute z\z 2 and 

z\ iz 2 - 

Polar forms of these complex numbers are 

z\ = 2^cosj + i siny j and Z2 = 2^cos-|- + i sin^-J 

(verify). Thus, it follows from 13 that 

r, Z2 =4[co S (| + |) + isin(| + |)]=4[cos(|) + isin(|)]=4i 

and from 14 that 

As a check, let us calculate z\Z2 and z\ / Z 2 directly: 

z\z 2 = (1 + /3j)(/3+ 0 = ][3 + i + 3i + fei 2 = 4i 

z, _ 1 + /Iit _ 1 + ^3? |/3-i _ j/3-i + 3i- i/Ii 2 _ 2^3 + 2 i _ /I i . 
22 /3 + j /3 + i /3-i 3-i 2 4 2 2 1 

which agrees with the results obtained using polar forms. 


The complex number i has a modulus of 1 and a principal argument of jr / 2- Thus, if z is a complex 
number, then \z has the same modulus as z but its argument is greater by jr / 2( = 90°); that is, multiplication 
by i has the geometric effect of rotating the vector z counterclockwise by 90° (Figure B.9). 










Figure B.9 


DeMoivre's Formula 


If n is a positive integer, and if z is a nonzero complex number with polar form 

z = |z|(cos <$> + i sin 

then raising z to the nth power yields 

z n =z m z • • • • • z = |z|” [cos($ + 0+ • • • +0)] +i[sin(0 + 0+ • • • +$)] 

it factors It terms n terms 


which we can write more succinctly as 


z n — |z|”(cos «<$ +i sin«6) 


In the special case where |z| = 1 this formula simplifies to 

z n — cos n<b + i sin n6 

which, using the polar form for z, becomes 

(cos + i sin <j>) n = cos n6 4- i sin n& 


(15) 


(16) 


This result is called DeMoivre's formula. 


Euler's Formula 

If 0 is a real number, say the radian measure of some angle, then the complex exponential function s u ’ is 
defined to be 


iO 

e =cos0 + isin0 


(17) 


which is sometimes called Euler's formula. One motivation for this formula comes from the Maclaurin series 
in calculus. Readers who have studied infinite series in calculus can deduce 17 by formally substituting \0 for 
x in the Maclaurin series for g x and writing 


je 


= 1 + id + 


(iff) 1 , Qey (ifff , (id) 


2 ! 


3! 


+ 


4! 


5! 


+ 


6 ! 


+ ... 


e z e 5 , e 4 , e- 


0 C 


1 't Z0 B 7- —1— - —1— 7- - - —J— 

^ 2! 3! 4! 5! 6! 


0 l , 0 


0 C 


= M_2_ + 2-2_ + 

2! 4! 6\. 




= cos 0 + i sin# 








where the last step follows from the Maclaurin series for cos 9 and sin Q. 


If z = a + bi is any complex number, then the complex exponential e z is defined to be 

e z = e a+i>! = e a e lb = e a {zos b + j sin b) 

It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, for example, 

E? z le z 2 = *, z l+ z 2 ; Zi = e zi-z2 > J_ = 

e z 2 q z 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Answer to Exercises 


Exercise Set 1.1 

(a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 
(a) and (d) are linear systems; (b) and (c) are not linear systems 
(a) and (d) are both consistent 

(a), (d), and (e) are solutions; (b) and (c) are not solutions 



y = t 


11. 


b. 

*1 

= 

}■-§ 

•+*-* 



*2 

— 

r 




*3 

= 

s 




X 4 

= 

* 



a. 

2 x\ 


= 

0 



3*1 

-4 * 2 = 

0 




*2 

= 

1 


b. 

3*1 



— 2 x 3 

= 5 


7*1 

4= 

*2 

4 = 4x 3 

= -3 




- 2 x 2 

4= x 3 

= 7 

c. 

7*i 

+ 

2 x 2 

+ x 3 - 

- 3 x 4 = 


*1 

4- 

2x2 

4- 4x 3 

= 

d. 

*1 



= 7 



*2 = -2 
x 3 =3 

* 4=4 


5 

1 


13. 


-2 6 " 

3 8 

9 -3 

b. [6 -1 3 4" 

0 5 -1 1_ 

020-310 
_3-l ioo-l 
6 2-1 2-3 6 

d. [1 0 0 0 -1 7] 


True/False 1.1 

True 
False 
True 
True 
False 
(f) False 
True 
False 

Exercise Set 1.2 

a. Both 
Both 
Both 
Both 
Both 
Both 

Row echelon 

a. *1= “37, x 2 = —8, *3 = 5 

b. xi — 13 1 — 10, x 2 = 13^ — 5, x 2 = — t 4= 2, X 4 = t 

c . = —7s + 2^ —11, x 2 = s, * 3 = —3^ — 4, X 4 = —3^ + 9, x$ = t 


3 . 






Inconsistent 

5 . x\ = 3, *2=1, *3 = 2 


7. x = t— 1, 7 = 2s, z = s, vt i = t 
9. *1=3, *2=1, *3 = 2 
11 . *=* — 1 , y = 2s, z = s, w =t 
Has nontrivial solutions 
Has nontrivial solutions 
17. *1 = 0, *2 = 0, *3 = 0 
19. *1 = — s, * 2 = *3 = 4 s, X4 = t 

21 . w = t, x = —t, y = t, z — 0 
23. /l — — 1. /2 = 0. h= 1. h = 2 

If a = 4? there are infinitely many solutions; if a = — 4? there are no solutions; if a ^ 4 , there is exactly one solution. 
If a = 3, there are infinitely many solutions; if a = —3, there are no solutions; if a * ± 3 ? there is exactly one solution. 

29. x = 2a_k v = _ £ + 26 

3 9’ y 3 9 


31. 


0 ?] and 

"l 0 " 

_0 1_ 

* = ± 1 , y 

= ±i! 


are possible answers. 


37 . (3 = 1, 2> = — 6 , c —2, ii = 10 

The nonhomogeneous system will have exactly one solution. 

True/False 1.2 

True 
False 
(c) False 
True 
True 
(f) False 
True 
False 
False 

Exercise Set 1.3 

1* a. Undefined 
b 4x2 
c Undefined 
d Undefined 

e . 5x5 

f . 5x2 

g. Undefined 

h 5x2 


b. 


d. 


7 

-2 

7 

-5 

0 

-1 

15 

-5 

5 


6 5 
1 3 
3 7 

4 

-1 

1 

0 

10 

5 


' _7 —28 -14] 
—21 -7 -35J 


Undefined 


f. 


22 -6 8 

-2 

10 
-39 
9 


-33 


4 6 
0 4 
-21 -24 
-6 -15 
-12 -30 












0 0 
0 0 


j. - 25 
168 

Undefined 

a. 12—3 
—4 5 

4 1 

Undefined 


c. 

"42 

108 

75" 


12 

-3 


21 


36 

78 

63 

d. 

" 3 

45 

9" 


11 

-11 

17 


_ 7 

17 

13 

e. 

" 3 

45 

9" 


11 

-11 

17 


_ 7 

17 

13 

f. 

"21 

17" 




17 

35_ 



g- 

' 0 

-2 


11" 


12 

1 


8_ 

h. 

"12 


6 

9" 


48 

-20 

14 


24 

8 

16 


61 

j. 35 

k. 28 
99 


7. 


a. 

b. 


c. 


d. 


f. 


[6741 41] 
[63 67 57] 

"41" 

21 

67 

' 6 " 

6 

63_ 

’24 56 97] 

"76" 

98 

97 


a. 

-3 


3 



■2 


12 



3 


-2 


7 


76 


3 


-2 



48 

= 3 

6 

4= 6 


5 


29 

= - 

-2 

6 

+ 5 

5 

+ 4 

4 


98 

= 7 

6 

+ 4 

5 

+ 9 


24 


0 



4 


56 



0 


4 


9 


97 


0 


4 


b. 

64 


6 


4 


14 



6 


-2 


4 


38 


6 


-2 


4 


21 

= 6 

0 

4=7 

3 


22 

= 

-2 

0 

4= 

1 

4=7 

3 


18 

= 4 

0 

4= 3 

1 

+ 5 

3 


77 


7 


5 


28 



7 


7 


5 


74 


7 


7 


5 


H- a. 

"2 -3 5" 

■*r 


7" 


9 -1 1 

*2 

= 

-1 


1 5 4 

*3 


0 


"4 

0 

-3 

f 

"*l' 


T 

5 

1 

0 

-8 

*2 


3 

2 

-5 

9 

-1 

*3 


0 

0 

3 

-1 

7 

*4 


2 


a. 5xi 4= 6 x 2 ~ 7*2 = 

—x\ — 2 x 2 + 3*3 = 

4*2 - *3 = 


2 

0 

3 









































































2 

2 

-9 


b. 

*1 


*2 

+ 

*3 

= 

2x\ 

+ 

3*2 



= 

5*1 

- 

3*2 

- 

6x3 

= 

-1 







II 

b = 

-6, 

c = 

-1 , d = 

1 

a. 

*11 

0 

0 

0 

0 

0 


0 

*22 

0 

0 

0 

0 


0 

0 

*33 

0 

0 

0 


0 

0 

0 

*44 

0 

0 


0 

0 

0 

0 

*55 

0 


0 

0 

0 

0 

0 

*66 

b. 

’*11 

*12 

*13 

*14 

*15 

*16 


0 

*22 

*23 

*24 

*25 

*26 


0 

0 

*33 

*34 

*35 

*36 


0 

0 

0 

*44 

*45 

*46 


0 

0 

0 

0 

*55 

*56 


0 

0 

0 

0 

0 

*66 

c. 

*11 

0 

0 

0 

0 

0 


*21 

*22 

0 

0 

0 

0 


*31 

*32 

*33 

0 

0 

0 


*41 

*42 

*43 

*44 

0 

0 


*51 

*52 

*53 

*54 

*55 

0 


*61 

*62 

*63 

*64 

*65 

*66 

d. 

*11 

*12 

0 

0 

0 

0 


*21 

*22 

*23 

0 

0 

0 


0 

*32 

*33 

*34 

0 

0 


0 

0 

*43 

*44 

*45 

0 


0 

0 

0 

*54 

*55 

*56 


0 

0 

0 

0 

*65 

*66 


,/*l\ /*l+*2\ 

7 r 2 / l *2 j 



b. 



c. 






i 


fix) 


± 

i 


1 x 

2 




1 


/<*).= * , 

I 2 










I 


/(*) . 


27. 


'1 1 0" 

One; namely, A — 

1 -1 0 



0 0 0 

a ' \] 11 

and 

h "ii 



L-i -ij 


b. 


Four; 


True/False 1.3 

l) True 
False 
False 
l) False 
) True 
False 
;) False 
0 True 
True 
True 
True 
False 
True 
i) True 
•) False 

Exercise Set 1.4 


\f 5 Ol 

\-f 5 0 


{5 0 


-{5 0 

[o 3 J 

[ 0 3 _ 


. 0 ~ 3 . 


0 —3 


5. 


1 J_ 

5 20 

5 10 

2 ° 

0 i 


ic z +o 


15. 


A = 


f * 

1 1 
7 7 


17. 


19 . 


_9_ 
" 13 


13 

_ 2 _ 

13 13 


b. 


41 15 
30 11 
11 -15 
-30 41 

6 2 
4 2 


d.n i] 

.2 -lj 

) 71 

1 6j 

























1 


21 . 


b. 


d. 


27. 


1 

*11 


39 13 

26 13 

'27 0 0 

0 26 -18 
0 18 26 

27 0 0 

0 0.026 0.018 
_ 0 -0.018 0.026 
~4 0 0 

0 —5 -12 
_0 12 -5 

'1 0 0 ^ 

0-3 3 

_0 -3 -3 
16 0 0 
0 -14 -15 
0 15 -14 

'25 0 0~ 

0 32 -24 
0 24 32 

0 ■ • • 0 

-L- ■ ■ ■ 0 

*22 

0 ■ ■ • -±- 

*«« 


L D=CA- 1 B- 1 A- 2 BC 2 {b T) j 1 


33. B~ 
35. 


a-1. 


37. 


A~ l = 


ill 

“2 2 2 

1 _1 1 

2 2 2 

1 1 _! 

2 2 2 

II 1 

"2 2 2 

1 0 0 


39. * 1= -L xo = ll 


23’ 


x 2 = 


23 


41. x\ — ro = A. 

Xl ir 2 li 

True/False 1.4 


False 
False 
False 
False 
False 
True 
g) True 
True 
False 
j) True 
False 


Exercise Set 1.5 

a. Elementary 
Not elementary 
c. Not elementary 















t'Ol'vJ 


Not elementary 


Add 3 times row 2 to row 1 


J 1 3 
v 


Multiply row 1 by — 


Add 5 times row 1 to row 3: 


0 0 

0 1 0 
0 0 1 
1 0 0 
0 1 0 
5 0 1 


Swap rows 1 and 3: 


0 0 10 
0 10 0 
10 0 0 
0 0 0 1 


a r 3 -6 

Swap rows 1 and 2: S./4 = 

-6 -6~ 
5 -1_ 




b. 

2 -1 

0 

-4 

-4 

Add —3 times row 2 to row 3: EA = 

1 -3 

-1 

5 

3 


-1 9 

4 

-12 

-10 

c. [13 28' 





Add 4 times row 3 to row 1 : EA = 


a. 

0 

0 

r 


0 

1 

0 


1 

0 

0 

b. 

0 

0 

r 


0 

1 

0 


1 

0 

0 

c. 


1 

0 0 



0 

1 0 



■2 

0 1 

d. 

'1 

0 

o' 
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1 

0 


2 

0 

1 

-7 

1 

n 


2 

-1 

J 



11. 2 3 

7 7 

3 1 
_7 7 

13. [ 3 
2 

-1 

2 


li 

10 5 

1 1 
_7_ 2 

10 5 


15. No inverse 


17. 


1 

2 

1 

2 

1 

2 


-1 

0 


_1 1 

2 2 

1 1 

2 2 

1 _1 

2 2 

0 -3 

1 0 
= 1 1 


19 . 

























21 . 


23. 


25. 


29. 


1 


1 



; o 


4 


2 





1 


1 


3 

0 


8 


4 


2 



0 


0 


1 

0 






2 



1 


1 


1 

1 


40 


'20 


'10 

5 


7 


5 


5 

r 


12 


24 
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4 


5 


5 


1 

1 


6 


12 

4 

2 


5 


5 


5 

1 


12 


24 

8 

4 


1 


1 


1 

1 


12 


24 

8 

4 


a. 


1 

0 

C 

) 0 



*1 






0 

J_ 

0 0 





*2 





0 

0 

J 

- 0 






k 3 




0 

0 

C 

i -L 







£4 


b. 

'l 


.1 

0 

0 



k 


k 





0 

1 

0 

0 



c 

) 

0 

1 

_1 






k 

k 



0 

0 

0 

1 


^ * 0, 1 





-3 

1 


1 

0 

1 1 

-4 0 

2 

2 


0 

2 

0 1 

0 1 


!?] 


"l 

0 

0~ 

"l 

0 

0~ 

0 

1 

3 

0 

4 

0 

0 

0 

1 

0 

0 

1 


31. 10-2] [10-2 
0 4 3 = 0 1 0 

0 0 1 0 0 1 

33 . _1 1 

4 8 

1 1 

4 8 

35 . 1 0 2 

0 1 "I 

4 4 

0 0 1 

Add — 1 times the first row to the second row. Add — ] times the first row to the third row. Add — \ times the second row to the first row. Add the second row to 
the third row. 



1 O' 

“7 0 

‘1 -f 

'1 O' 


-1 1_ 

4 

0 1_ 

_0 1_ 

.° i 



"1 

0 

1 

4 

o' 

'1 

0 

o' 

"1 

0 

2" 
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0 

0 

0 

1 

-3 

0 

1 
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0 

0 

1 

0 

0 

1 

0 

0 

1 


True/False 1.5 

False 
True 
True 
True 
True 
True 
(g False 


Exercise Set 1.6 
1 *i = 3, x 2 = -1 
3 . *1 = “ 1 *2 = 4 , x 3 = -7 
5. x = l, y = 5, z= -1 
7. xi = 2b \ — 5&2, *2= — &1 + 3&2 


i- *, = 22. 
1? , 

17 ’ 


*1 : 


*2 =17 

11 

17 


*2 z 


9 . 









































11 . 


L *1 = 


11 n = 


15 

34 


> x 2 = 


A_ 
15 
28 

, 1 = - * 2 = - 

iiL xl= ii * 9= n 
X1 15 ’ * 2 15 

1_3 

5 


* 1 = - *2 = 


13. No conditions on b\ and 62 
15. ^3 = ^1 ~h 

7. &i=i>3 4=Z?4, &2 = 2&3 + &4 


19. 


^r= 


11 12 -3 27 26 

—6 -8 1 -18 -17 

— 15 -21 9 -38 -35 


True/False 1.6 

a) True 
True 
True 
d) True 
True 
f) True 
True 

Exercise Set 1.7 

1. 


-k 0 


0 - 

-1 

0 

0 

6 


1 

5 

0 0 


0 3 
3 

4 -1 
4 10 

-15 10 

2 -10 
18 -6 


A 2 = 


11 . 


0 20 - 20 ' 


6 0 6 


I 

<?i 

1 

o\ 

: ~ 6 . 



"1 O' 

r 

ih> 

L 

II 

0 - 
4 

, A~ k = 


1 / (—2)* 


o -7T 


o 

o 


0 9 0 

II 

f 

o 

o 



0 4* 


Not symmetric 
Symmetric 
Not symmetric 
19. Not symmetric 
Not invertible 
23. = - 8 

25 . x* 1, -2,4 


27. 


35 . 


1 0 0 
0-1 0 
0 0-1 

a. Yes 

No (unless « = 1) 
c. Yes 

No (unless n = 1 ) 























39. 


43. 


0 0-8 
0 0-4 
8 4 0 

1 10 

0 -2 


A = 


True/False 1.7 

l) True 
0 False 
False 
l) True 
) True 
False 
;) False 
0 True 
True 
False 
0 False 
False 
True 


(i) 

(j) 


Exercise Set 1.8 

1. 


50 



3. a. *3 “* 4 = “ 500, — x\ 4 =* 4 = 100, — xj = 300, X 2 — *3 = 100 

b. = —100 + 2, X2= —400 + 2 , X 3 = —500 + 2 , *4 = 2 

c. For all rates to be nonnegative, we need t = 500 cars per hour, so x\ = 400, X 2 = 100, x 3 = 0, *4 = 500 

5- l\ — -y-A, / 2 = -|A, / 3 =+A 

7. / 1= / 4 = / 5 = / 6 = Ia, / 2 = / 3 = 0A 

9 . *i = l f X2 = 5 , 7:3 = 3 , and *4 = 4 ; the balanced equation is C3H3 + 5C>2 —► 3CC>2 + 4H2O 
1 x\ = X 2 = *3 = X 4 = 2; the balanced equation is CH3COF + H2O —► CH3COOH + HF 
13. p(x) = x 2 -2x + 2 
IS- p W = l + ^x-Ix 3 

17 - 9 

a. Using cat j = k as a parameter, p (x) = 1 + kx + (1 — k)x where — 00 < k < 00 • 

The graphs for k = 0, 1,2, and 3 are shown. 




4 

k-0 

3 

k= 1 

2 

- 

1 1 

1 1 r* 

2 1 

1 2 


True/False 1.8 

True 

False 

True 

False 

False 

Exercise Set 1.9 

a '0.50 0.25 
0.25 Oioj 








b. r S 25, 290" 

$ 22, 581 

3. a . [0.1 0.6 0.4" 

0.3 0.2 0.3 
0.4 0.1 0.2_ 

b. [ $ 31,500" 

$ 26, 500 
$ 26, 300 

5 . [ 123.08] 

|_202.56 J 

True/False 1.9 

(a False 
True 

(c False 
True 

(e True 

Chapter 1 Supplementary Exercises 

1 . 3x\ — X 2 + *4 = 1 

2 x\ + 3 x2 + 3*4 = — 1 

3 3 1 9 15 

* 1 = 2 *- 2 '" 2 ’ * 2 = “ 2 S “ 2 '" 2 ’ * 3 = "’ * 4 
3 . 2x\ - 4x 2 4= X 3 = 6 
—4xi 4= 3x3 = — 1 


x 2 - x 3 = 3 



7. x =4, y = 2, z—3 

a . a * 0, b * 2 

b. a* 0, b = 2 

c . a = 0, & = 2 

d. a = 0, 6 * 2 



113 160 

37 37 


c. 


37 37 


15. a= 1, 6= -2, c = 3 

Exercise Set 2.1 

1 M n =29, Cn=29 
M\2 = 21, Ci2= -21 
Afl 3 = 27, C 13 = 27 
M 21 = -11, C 2i = ll 
M 22 = 13, C 22 = 13 
^23= —5, C 2 3 = 5 
M 31 = - 19, C 3 1 = - 19 
A$22 = — 19, C 3 2 =19 
M 33 =\9, C 33 =19 

a. -^13 = 0, Ci 3 = 0 

b. ^23 = —96, C 2 3 = 96 

c. M 22 = -48, C 22 = -48 


3 . 











d . Mj ,=72, C 2 1 = -72 


5. 

" 2 

5 ' 

22; 

11 

22 

1 

3 


11 

22 

7. 

2 

7 

59; 

59 

59 

7 

5 


59 

59 


9. a 2 -5a+ 21 
11. -65 
13. -123 
15 . A= 1 or -3 
17 . A = 1 or — 1 
9 . (all parts) — 123 

21 -40 

23. 0 
25. -240 
27. “I 
29. 0 
31. 6 

33 The determinant is s in 2 0 -H cos 2 0 = 1 • 

35. d 2 = d\ + A 


True/False 2.1 

False 

False 

True 

True 

True 

False 

False 

False 

True 

Exercise Set 2.2 

5. -5 
7. -1 

9. 1 

5 

33 

6 

17. -2 


Exercises 14: 39; Exercise 15: 6; Exercise 16: —Exercise 17: —2 

6 


21. “6 
23. 72 
25. -6 
7. 18 

True/False 2.2 

True 

True 

False 

False 

True 

True 

Exercise Set 2.3 

Invertible 

Invertible 





Not invertible 
13. Invertible 

15 - , t . 1±j[E 
** 2 
17. fc * - 1 

19. 


3 -5 -5 
=3 4 5 

2 =2 =3 


21 . 


23. 


1 2 
2 2 


1 I 


0 0^ 


-4 3 

2 =1 

-7 

6 


0 -1 
0 0 
0-1 8 
0 1 -7 


5.* = x , = 2L Z= _X 


11 


11 


11 


27 30 38 

*1 = “ "fp *2 = - -jj’ * 3 = 

29. Cramer's rule does not apply. 

31. y = 0 


a. —189 

b. _1 

7 

c. _2 

7 

d. _L 

56 

e. 7 

189 

b. 1 

7 

c. I 

7 

d. _L 

56 


40 

11 


True/False 2.3 

(a False 
False 
True 
(d) False 
True 
True 
True 
True 
True 
True 
True 
False 

Chapter 2 Supplementary Exercises 
1. -18 

3 24 

5. -10 

7. 329 

Exercise 3: 24; Exercise 4: 0; Exercise 5: -10; Exercise 6: —48 
The matrices in Exercise 1-3 are invertible, the matrix in Exercise 4 is not. 
13. —b 2 + 56 — 21 
15. -120 








17 . 


19 . 


21 . 


23 . 


1 1 

‘6 9 

1 2 

6 9 


1 

1 

3 

8 

8 

8 

1 

5 

1 

8 

24 

24 

1 

7 

1 

4 

"12 

12 

1 

2 

1 ' 

5 

5 

10 

1 

3 

2 

5 

5 

5 

2 

6 

3 

5 

5 

10 


10 

2 

52 

27 

329 

329 

329 

329 

55 

11 

43 

16 

329 

329 

329 

329 

3 

10 

25 

6 

47 

47 

47 

47 

31 

72 

102 

15 

'329 

329 

329 

329 


s . *'-!*+ f * 


29 . 


(b) C0S , = Z+Z^1 


2flC 


, cos 7 = 


« 2 + i> 2 - c 2 

2a & 


Exercise Set 3.1 

1 . « 


d. 


r=r 


'f 


: :< 3 , 4 , 5 ) 


* ( 3 * 4 * 5 ) 

H 




4”7 

( 3 . f4 . 5 )| 


I I I I I I 

LI .. 17 
(3.4, -5) 

(- 3 , - 4 . 5)12 


it 


^:! (j-3,4,-5) 


3 . 


a. 


- 1 
I 
i 
I 
























b. 




y 


i i i 


X 


c. 


d. 


e. 


f. 


b. 


c. 


P i P 2 = (- 1,3) 
b - Pl?2 = (-3, 6,1) 

The terminal point is B(2, 3). 

The initial point is A( — 2, — 2, — 1). 

a. u= (—1, 2, — 4) is one possible answer. 

b. u = (7, — 2, — 6) is one possible answer. 

13. a u+w= (1, —4) 

b . v —3u=(-12, 8) 

c . 2(u — 5w) = (38, 28) 

d . 3v-2(u + 2w) = (4,29) 

e> —3 (w — 2u -h v) = (33, -12) 
f (—2u —v) — 5(v+ 3w) = (37, 17) 

a . (-1,9, -11,1) 

b . (22,53, - 19, 14) 

c. (-13,13, -36, -2) 

d (_90, - 114,60, -36) 

e . (-9, -5, -5, -3) 

f . (27, 29, - 27, 9) 

a . w—u= (—9, 3, -3, -8,5) 

b. 2v-h3u=(13, -5, 14, 13, -9) 




y 
























c . —w+ 3(v —u) = (—14, -2,24,2,7) 

d . 5(—v + 4u—w) = (125, -25, -20,75, -70) 

e. —2(3w +v) + (2u + w) = (32, - 10, 1, 27, - 16) 

f - i(w-5v + 2u)+y=(|, f. -12, -§. - 2 ) 

19. a v — w= (—2, 1, -4, -2,7) 

b . 6u-H 2v= ( —10, 6, -4,26,28) 

c . (2u — 7w) — (8v + u) = (—77, 8, 94, -25,23) 



Not parallel 

Parallel 

Parallel 

25. a = 3, b = -1 
27. ci = 2, C2 = - 1, C 3 = 5 
9. Cl = 1, £72 = 1, C3 = — 1, £74 = 1 

a. ( 9 _ 1 n 

U' 2’ 2 J 

b. /23 _9 n 

\ 4 4’ 4) 

True/False 3.1 

(a False 
False 
(c False 
True 
True 
(f) False 
(g False 
True 
False 
True 
(k) False 

Exercise Set 3.2 



3 - a. ||u + v|| = /83 

b- IM| + ||v|| = /r7 + /26 

c. ||—2u + 2v|| = 2/3 

d. ||—3u — 5v+w|| = |/466 

a. ||3u-5v+w|| = /2570 

b. ||3u||-5||v|| + ||w|| = 3/46-10/2i + /42 
«• II - IMMl = 2/966 

7 . fc=5 k= _l 

K r K 1 

a< a ■ v = — 8, u • u = 26, v • v = 24 
b. u*v = 0, u-u = 54, v-v=21 

a. ||u —v|| = {\4 

b. ||u — v11 = / 59 

c. ||u —v|| = ^677 


11 . 


13. 


a - cos 9 = 


15 


/ 27 /I 7 ; 


0 is acute 


b cos 9 = — 


{s{A5 ; 


0 is obtuse 


c - cos 9 = — 


136 


/ 225 /Tio ; 


0 is obtuse 


15. 




a • b = 45 


17. 


19. 


a. u ■ (v • w) does not make sense because v • w is a scalar. 

b. u- (v + w) makes sense. 

||u ■ v|| does not make sense because the quantity inside the norm is a scalar, 
(u • v) — ||u|| makes sense since the terms are both scalars. 


(4 -I) 


(5/2 ■ 5/2) 

_3 1 J 3 

4’ 2’ 4 


_2_3_4_5_ 

{{55’ {55* / 55 ’ / 55 ’ /55 


23. 


;0 = - 


11 


/%2 


b- cos 0=-=- 

/10 


25. 


cos 0 = 0 
cos 0 = 0 

|u • v| = 10. Hull ||v|| = /TJ /17 ss 14 866 
|u • v| =7, ||u||||v|| = /io/i4« 11.832 
|u • v| = 5, ||u|| ||v|| = (3) (2) = 6 


7. A sphere of radius 1 centered at (xg, zq)- 

True/False 3.2 

(a) True 
True 
(c False 
True 
True 
(f) False 
(g False 
False 
True 
(j) True 

Exercise Set 3.3 

a. Orthogonal 
Not orthogonal 
Not orthogonal 
Not orthogonal 


Not an orthogonal set 
Orthogonal set 
Orthogonal set 
Not an orthogonal set 


5. 


± 


Ik k 



Yes 






9. — 20 -t-1) + O — 3) — (z + 2) = 0 

11 2 z = 0 
13. Not parallel 
Parallel 


Not perpendicular 

19 - a. 2 

5 

b. _1S_ 

fn 

21. (0. 0) (6, 2) 

23. 


25. 


27. 


(-#»• -§)■ (§•'■ -ft) 

('if) 

n _1 _1_ _ M (9 6 _9_ 21A 
\5’ 5’ 10’ 10 }' ^5’ 5’ 10’ 10 J 


9. 1 

3i. m 


5 
3 

1 

/29 

0 (The planes coincide.) 

b) cos 0=ii^T’ cos 'y=i6 r 


33. 


35. 


37. 


True/False 3.3 

(a True 
True 
(c True 
True 
(e True 
(f) False 
(g False 

Exercise Set 3.4 

1. Vector equation: ( x , y) = ( _ A, 1) + t(fl, - 8 ); 


parametric equations: x = — A, y = 1 — St 
3. Vector equation: (x,y,z) =t( — 3,0, 1); 


parametric equations: x = —31, y = 0 , z = t 
Point: (3, — 6 ); parallel vector: ( — 5, — 1) 

Point: (4, 6 ); parallel vector: (— 6 , — 6 ) 

9. Vector equation: (x,y,z) = (-3, 1, 0) +*i(0, -3, 6 ) +* 2 (-5, 1, 2); 


parametric equations: x = - 3 - 5 1 2 , y = 1 - 3t \ + * 2 , * = + 2* 2 

1 Vector equation: = (- 1, 1,4) + *i( 6 , - 1, 0) -K 2 (- 1, 3, 1); 


parametric equations: x = - 1 + 6/1 - * 2 , 7 = 1 -1 \ + 3 * 2 , z = 4 + 1 2 
A possible answer is vector equation: y ) = t(3, 2)5 

parametric equations: x = 31, y = 2t 

A possible answer is vector equation: y,z) = t\(0, 1, 0) 4 = * 2 (5, 0, 4)5 


parametric equations: x 4 = 5 * 2 , y = t\, z = 4* 2 
17. *1 = —s — t, X 2 = s, X 3 = t 

9 * xi = jr- -^-s- yA x 2 = -yr+ys+y*, X 3 = r, *4 = 5 , = t 


21 . 


a. (l,0,0)+s(-l, 1,0)+*(-1,0, 1) 

a plane in £ 3 passing through P( 1, 0, 0) and parallel to ( — 1, 1, 0) and ( — 1,0, 1) 


23 * a. * + 7 + z = 0 

-2x + 3^ =0 

a line through the origin in £ 3 

c - X= -|«, y= z = t 

25. a 2,1, 

a X\ = - yS + y*, X2 = S, *3 =t 

c - *1 = 1 — -|s4= y*, ^2 = S, *3 = 1 +1 

7- xi = j — ys — y*, *2 = s, X 2 = t, * 4 = 1 ; The general solution of the associated homogeneous system is x\ - 
particular solution of the given system is x i = -y, *2 = 0, *3 = 0, 7:4 = 1. 

True/False 3.4 

True 
False 
(c) True 
True 
(e) False 
True 

Exercise Set 3.5 

1. a . (32, -6,-4) 

b . (-14, -20, -82) 
c . (27,40, -42) 

3 . (18,36, -18) 

5 . (- 3, 9, - 3) 

7. /59 
9. fm 
3 

13. 7 

15. i /374 


4 1 

“I*” 3*' X2 = s ’ * 3 = 


16 

The vectors do not lie in the same plane. 
21 . -92 

23. abc 
25. a -3 
b 3 
c 3 

27 ’ a. ^26_ 

2 

b \j26_ 

3 

29 . 2(vxu) 


37. 


a 17 

6 

b 1 

2 


True/False 3.5 

True 

True 

False 

True 

False 

False 


t, X 4 = 0. A 



Chapter 3 Supplementary Exercises 


a . 3v — 2u= (13, -3, 10) 

b. ||u +v + w|| = /70 

c. fm 

d ’ projw“= — 27"( 2 ’ -5, -5 ) 


profeu = - 


e. u • (vxw) = — 122 

f. ( — 5v + w) x ((u • v)w) = ( — 3150, —2430,1170) 

3. a . 3v — 2u = ( — 5, -12,20, -2) 

b. ||u + v + w|| = }j 106 

c. 2810 

d ' P ro Jw u= — ^y(9, 1, -6, -6) 

Not an orthogonal set 

A line through the origin, perpendicular to the given vector. 

A plane through the origin, perpendicular to the given vector. 

{ 0 } (the origin) 

A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 


9. True 


11 S(-l, -1,5) 



17. Vector equation: (x, y, z) = ( = 2, 1, 3) +/j(l, -2, -2 )+t 2 (5, “1, -5); 

parametric equations: * = -2 + *i + 5 t 2 , y=\-2t\-t 2 , z=3-2t\ -5t 2 
19. Vector equation: (x,.y) = (0, — 3) +*(8, — 1); 

parametric equations: x = St, y = —3 —t 

A possible answer is vector equation: ^) = (0, — 5) + *(1, 3); parametric equations: x=t, y = — 5 -T 3^ 

23. 3(x+l)+6O-5) + 2(z-6) = 0 
25. — 18(x — 9) — 51y — 24(z —4) = 0 
A plane 

Exercise Set 4.1 

1. (a) u + v = (2, 6), 3u = (0, 6) 

Axioms 1-5 

3 The set is a vector space with the given operations. 

Not a vector space, Axioms 5 and 6 fail. 

Not a vector space. Axiom 8 fails. 

The set is a vector space with the given operations. 

The set is a vector space with the given operations. 

True/False 4.1 
(a False 
False 
(c True 
(d) False 
(e False 

Exercise Set 4.2 

(a),(c),(e) 

3. (a), (b), (d) 

5. (a), (c), (d) 

7. (a), (b), (d) 

9. (a), (b), (c) 


11 . 


a. The vectors span 
The vectors do not span 
The vectors do not span 
The vectors span 

The polynomials do not span 

a- Line; x = y = -| t, z = t 

b. Lm&; x = 2t, y=t, z = 0 
Origin 

Origin 

e. Line; x = -3 1, y = -2 1, z = t 

f. Plane; x-3y+z = 0 

True/False 4.2 

True 
True 
(c False 
(d) False 
(e False 
True 
True 
False 
False 
True 
(k) False 

Exercise Set 4.3 

a. U 2 is a scalar multiple of uj. 

The vectors are linearly dependent by Theorem 4.3.3. 

P 2 is a scalar multiple of Pi. 

B is a scalar multiple of A. 

I. None 

They do not lie in a plane. 

They do lie in a plane. 

(h) vi = |v 2 - |v 3 , v 2 = + |v 3> v 3 = - + |v 2 

9 *= 4’ a=1 

They are linearly independent since vj, V 2 , and V 3 do not lie in the same plane when they are placed with their initial points at the origin. 
They are not linearly independent since v\, V 2 , and V 3 line in the same plane when they are placed with their initial points at the origin. 
21 W(x) = — x sin x — cos x * 0 for some x. 

23 - a. W(x) = e x *0 

b . W{x) = 2 * 0 

25. W(x) = 2 sin x * 0 for some x. 

True/False 4.3 

(a False 
True 
(c False 
True 
True 

(f) False 

(g) True 
(h False 

Exercise Set 4.4 


1 . 


A basis for g} has two linearly independent vectors. 

A basis for g} has three linearly independent vectors. 
A basis for Pj has three linearly independent vectors. 
A basis for M 22 has four linearly independent vectors. 


(a), (b) 

a. (w)*=(3, -7) 



a. (v)<?=(3, -2,1) 

b. ( v )s = (“2, 0, 1) 
1 (A) s =(- 1,1, -1,3) 

13 . A = A\ - A 2 + A 2 - A 4 
5. P = 7pi-8p 2 d-3p3 
a. (2, 0) 



( 0 , 1 ) 



True/False 4.4 

(a False 
False 
True 
True 
(e False 

Exercise Set 4.5 

Basis: (1,0, 1); d im ension = 1 

3. Basis: (4, 1, 0, 0), (—3, 0, 1, 0), (1, 0, 0, 1); dimension = 3 
No basis; dimension = 0 



( 1 , 1 , 0 ), ( 0 , 0 , 1 ) 

c. (2, -1,4) 

d. (1, 1,0), (0, 1, 1) 

9 - a. " 

b. «(» + !) 

2 

c . n{n 4 -1) 

2 

Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 
v 3 = O, c) wit h 9a - 3b - 5c * 0 

True/False 4.5 

True 
True 
(c False 
True 
True 
True 
True 
True 
True 
(J) False 

Exercise Set 4.6 





1 . 


[ w ]*= 


b. 


[ w ] S = 


[ w ]tf = 


3 

-7 

28 

_3_ 

14 


b-a 

2 


(P)s=(4. -3,1), [p]* = 


b. 


(p)^=(0,2, -1), [p]*= 


4 

-3 

1 

O' 

2 

-1 


a . w= (16, 10, 12) 

b. q = 3 + 4x 2 


B = 


15 -1 
6 3 


b. 


11 _i 

10 2 

-t • 

o -f 


b. 


[w ] B = 

3 2 

-2 =3 

5 1 

[w] B = 


1 Z 

10 

8 

5 

5 
2 

2 

6 


[w ] B ' = 


9 

-9 

5 


. [w ] B ' = 


_7 

2 

23 

2 

6 


11 . 


13 . 


(b) 


(c) 


(d) 


(a) 


(b) 


2 0 
1 3 


i 0 

1 1 

“6 3 


Wb 


-[4 


Wb' = 


l 

-2 


(d) 


1 2 3 

2 5 3 

1 0 8 
-40 16 9 

13 -5 -3 
5 -2 -1 
-239' 


[w ] B = 


(e) 


[ w ] S = 


11 

30 


Ms= 


[w] B = 


5 

-3 

1 

-200 

64 

25 


































15 . 


(a) 


(b) 


3 5 

=1 -2 
2 5 

-1 -3 


(d) [w] Bl = 

2 " 

-1_ 

, [w] b 2 = 

~-f 

1_ 

E) WBj = 

3" 

-1 

, W By = 

4" 

-1 


17 . 


* i 


=2 -3 - 
5 1 


b. 


[w ] b 1 = 


[w] b 2 = 


_1_ 

2 

23 

2 

6 


19 . 


23 . 


| cos 29 sin 29 
sin 29 —cos 29 


a. B= {(1,1,0), (1,0, 2), (0,2,1)) 

b H(M- 


2 2 
5' 5 


•«} 


True/False 4.6 

True 

True 

True 

True 

(e) False 

(f) False 

Exercise Set 4.7 

1 . ri = (2, -1,0,1), r 2 =(3,5,7, - 1), 13 = (1,4, 2.7); 



2 


-1 


0 


1 

ci = 

3 

- c 2 — 

5 

» c 3 = 

7 

, c 4 = 

-1 


1 


4 


2 


7 


1 
4 

b is not in the column space of A. 


1 


-1 


1 


5 

9 

-3 

3 

4= 

1 

= 

1 

1 


1 


1 


-1 


d. 


r 


~-r 

1 

+«-d 

1 

-1 


-1 


= -26 


+ t 


4=13 


b. 


1 

0 

-2 

7 

0 

-1 

0 

0 

0 


4 -t 


■ t 


+ t 


4-r 



1 

3 


1 

-f 

; t 

-1 

. 

1 


-7 


4=4 



-1 


-2 


2 


-1 


-2 


0 


0 


1 


0 


0 

4 ‘S 

1 

+ t 

0 

; r 

0 

4=s 

1 

4= t 

0 


0 


1 


0 


0 


1 








































































d. 


6 


7 


1 


1 


1 

5 


5 


5 


5 


5 

7 


4 


3 


4 


3 

5 


5 


5 

; s 

5 


5 

0 


1 


0 


1 


0 

0 


0 


1 


0 


1 


a. 

T 


~2" 

ri = [1 0 2], r 2 = [0 0 1], ci = 

0 

> c 2 = 

1 


0 


0 


b. 

T 


~-3~ 

ri = [l —300], r 2 = [0 1 0 0], ci = 

0 

0 

> c 2 = 

1 

0 


0 


0 


c. r 1 = [1 2 4 5], r 2 = [0 1 -3 0], r 3 = [0 0 1 - 3], r 4 = [0 0 0 1], 


1 


2 


4 


5 

0 


1 


-3 


0 

0 

> C 2 = 

0 

- C 3 = 

1 

, c 4 = 

-3 

0 


0 


0 


1 

0 


0 


0 


0 


d. n = [1 2 -1 5], r 2 = [0 14 3], r 3 = [0 0 1 -7], r 4 = [0 0 0 1] 



1 


2 


-1 


5 


0 


1 


4 


3 

ci = 

0 

- c 2 = 

0 

- C 3 = 

1 

- c 4 = 

-7 


0 


0 


0 


1 


a. 

T 


'2" 

ri = [ 1 0 2]; r 2 = [0 0 1 ]; ci = 

0 

; c 2 — 

1 


0 


0 


b. 

r 


ri = [l —3 0 0];r 2 =[0 1 0 0]; ci = 

0 

0 

; c 2 = 


0 


c. ri = [ 1 2 4 5 ]; r 2 = [ 0 1 -3 0 ]; r 3 = 

:o 

0 1 - 


-3 

1 

0 

0 

3]; 


r 4 = [0 0 0 1 ]; ci = 


r 4 = [0 0 0 1 ]; ci = 


l 


2 


4 


5 

0 


1 


-3 


0 

0 

; c 2 = 

0 

; c 3 = 

1 

; c 4 = 

-3 

0 


0 


0 


1 

0 


0 


0 


0 

= [0 1 4 

3]; r 3 = 

[0 0 

1 -7]; 

1 


2 


-1 


5 

0 


1 


4 


3 

0 

; c 2 = 

0 

; c 3 = 

1 

; c 4 = 

-7 

0 


0 


0 


1 


11 . 


15 . 


17 . 


a. ( 1 , 1 , -4-3), (0,1, -5, -2), (o, 0, 1, 

b * ( 1 , - 1 , 2 , 0 ), ( 0 , 1 , 0 , 0 ), | 0 , 0 , 1 , -i) 

c (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 

(b) 


0 0 0 
0 1 0 
0 0 1 


- 

[3* - 


-5a 

5b 


for all real numbers a, b not both 0. 


Since A and B are invertible, their null spaces are the origin. The null space of C is the line 3* = 0- The null space of D is the entire xj^-plane. 


True/False 4.7 

True 

False 

False 


























































(d) False 
False 
True 
True 
False 
True 
False 

Exercise Set 4.8 

1 Rank(,4)=Rank(,4 r )=2 

a. 2; 1 

b. 1 ; 2 

c. 2; 2 

d. 2; 3 

e. 3; 2 

5. a Rank = 4, nullity = 0 

b. Rank = 3, nullity = 2 

c> Rank = 3, nullity = 0 

a. Yes, 0 

b. No 

c. Yes, 2 

d. Yes, 7 
No 
Yes, 4 
Yes, 0 

9. & l = r, = s, &3 = 4s — 3r, 64 = 2r — s, b$ = %s — Ir 

11. No 

13 Rank is 2 if r = 2 and s = 1; the rank is never 1. 



True/False 4.8 


False 
True 
(c False 
(d) False 
(e True 
(f) False 
(g False 
(h) False 
True 
(j) False 

Exercise Set 4.9 

Domain: codomain: 

Domain: codomain: g} 

Domain: codomain: 

Domain: g 6 ; codomain: 

3. R 2 , R 3 , (-1,2,3) 

a. Linear; p? _» 

Nonlinear; g? 




c. Linear; g? _► 

Nonlinear; £ 4 _ ¥ g 2 

(a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 


9. 


11 . 


3 5 -1 

4 -1 1 

3 2-1 


; 7( — 1, 2, 4) = (3, -2,-3) 


25. 


13. 


15. 


17. 


19. 


0 1 
-1 0 
1 3 

1 -1 

7 2—11 
0 1 10 
-10 0 0 

0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 

d. 0 0 0 1 

10 0 0 

0 0 10 

0 1 0 0 

10-10 

a . 7(- 1,4) = (5,4) 

b . 7(2, 1, — 3) = (0, -2,0) 

a. (2, -5,-3) 

b (2,5,3) 

c. (-2, -5,3) 

a. (“2,1,0) 

b. (-2,0,3) 
c (0, 1, 3) 


’ b 


1/3 —2 1+2/3 


b. (0, 1.2/2) 

c. (-1. -2.2) 


21 . 


{ L±i -1+2/3 


■ 2 , 

£. 

b. (-2/2, 1,0) 

c (1,2,2) 


29. 


Twice the orthogonal projection on the x-axis. 
Twice the reflection about the x-axis. 


Rotation through the angle 20- 

3: Rotation through the angle 0 and translation by xq; not a matrix transformation since xq is nonzero. 
35. Aline in£”. 

True/False 4.9 

(a) False 
False 
(c) False 
True 















False 
0 True 
False 
False 
i) True 

Exercise Set 4.10 


1 . 

' 5 -1 21" 


'-8 -3 f 

II 

0 

10 “8 4 

, T A oT b = 

00 

1 

m 

7 

1 


45 3 25 


44 -11 45 



b - T 2 o 7*! 



T\oT 2 = 


4 

-4 


c. T 2 (T\ (x\, x 2 )) = ( 3 xi + 3 x 2 , 6 x\- 2 x 2 ). 


T\(T 2 {x i, x 2 )) = (5j:i+4x 2 , — 4x 2 ) 


b. 


-1 0 0 ~ 

0 0 0 
0 0 1 

1 0 f 

0 /2 0 

-1 0 1 _ 

-1 0 0 ] 

0 1 0 
0 0 0 

a. T\oT 2 = T 2 oT\ 

b. T\oT 2 = T 2 oT\ 

c. T\oT 2 ± T 2 oT\ 



1 0 

0 -1 

0 O' 

0 i 

3 0 
0 -3 


11 . 


13. 


Not one-to-one 

b. One-to-one 

c. One-to-one 

d. One-to-one 

e. One-to-one 
One-to-one 
One-to-one 


One-to-one; 


Not one-to-one 
One-to-one; 
Not one-to-one 


T '("l. W2) = (jwi-|w 2 . + 

F? ■;]• t ~‘ 


(wi, w 2 ) = (-w 2 , -Wi) 


Reflection about the x-axis 
Rotation through the angle 

Contraction by a factor of -j 

Reflection about the yz-plane 
Dilation by a factor of 5 























17. 


19. 


21 . 


23. 


25. 


27. 


29. 


Matrix operator 
Not a matrix operator 
Matrix operator 
Not a matrix operator 

Matrix transformation 
Matrix transformation 

-1 0 
0 0 
b. 0 1 
-1 0 
0 0 
3 0 

a. ^(ei) = (- 1,2,4), T A (e 2 ) = ( 3,1,5 

b. Ta(<> 1 + + e 3 ) = (2, 5, 6) 

c. ^(7e 3 ) = (0, 14, -21) 

a. Yes 
Yes 


(b) 


T{x l, *2) = (*? + *2> 'w) 


The range of T is a proper subset of 
T must map infinitely many vectors to 0. 

True/False 4.10 

(a False 
True 
True 
(d False 
(e^ False 
(f) False 

Exercise Set 4.11 

1 . , 


b. 


d. 


0 -1 
-1 0 

-1 0 

0 - 1 
1 0 

0 °, 

0 0 
0 1 


a. 

1 

0 


0 


0 

1 


0 


0 

0 


-1 

b. 

1 


0 

0 


0 


•1 

0 


0 


0 

1 

c. 


■1 

0 

0 



0 

1 

0 



0 

0 

1 

a. 

'0 


■1 

0 


1 


0 

0 


0 


0 

1 

b. 

1 

0 


0 


0 

0 


-1 


0 

1 


0 

c. 


0 

0 

1 



0 

1 

0 



■1 

0 

0 


I, r^(e 3 ) = (0,2, -3) 


Rectangle with vertices at (0, 0), (—3, 0), (0, 1), (—3, 1) 












9. 


11 . 


13. 


1 0 
4 1 
1 -2 
0 1 


Expansion by a factor of 3 in the x-direction 

Expansion by a factor of 5 in the y-direction and reflection about the x-axis 
Shearing by a factor of 4 in the x-direction 


1 0 
0 5 
1 0 " 

2 5 . 

0 -1 
-1 0 


17. 


y = T 


y 


y = x 

¥ 

y = —2x 

V= L i±iH 


11 


19. 

23. 


No 


1 0 k 
0 1 k 
0 0 1 

Shear in the xz-direction with 


factor k maps (x, y, z) to ( x + ky , y, z 4- ky) ■ 


1 k 0 
0 1 0 
0 k 1 


Shear in the yz-direction with factor k maps (x, y, z) to (*, y =|= kx, z 4= jfcx): 


1 0 0 
k 1 0 
k 0 1 


True/False 4.11 

False 
True 
True 
True 
False 
False 
g) True 

Exercise Set 4.12 


1 . 


Stochastic 
Not stochastic 
Stochastic 
Not stochastic 

0.54545] 

0.45455J 

Regular 
Not regular 
Regular 

_ 8 _” 

17 
_9_ 

17 


7 . 



















9. 


11 . 


13. 


Probability that something in state 1 stays in state 1 
Probability that something in state 2 moves to state 1 
0.8 
0.85 

[0.95 0.55 
[o.05 0.45 
0.93 
0.142 
0.63 


15. 


Year 

1 

2 

3 

4 

5 

City 

95,750 

91,840 

88,243 

84,933 

81,889 

Suburbs 

29,250 

33,160 

36,757 

40,067 

43,111 


City 

46,875 

Suburbs 

78,125 


17. 


23 

100 

46 


159 

22 

53 

47 


159 

35, 50, 35 


19. 


7 1 1 


l 

10 10 5 


3 

1 3 1 


1 

5 10 2 

- q = 

3 

1 3 3 


1 

10 5 10 


3 


P = 


P'^q = q for every positive integer k 

True/False 4.12 

True 

True 

(c) True 

(d) False 

(e) True 

Chapter 4 Supplementary Exercises 

1. (a) u + v=(4,3,2), -u= (-3,0,0) 

Axioms 1-5 

If s * 1, — 2, the solution space is the origin. If s = ], the solution space is a plane through the origin. If s = — 2> the solution space is a line through the origin. 
A must be invertible 
9. a Rank = 2, nullity = 1 
b Rank = 2, nullity = 2 
c Rank = 2, nullity = n — 2 




1 , x 2 , x 4 , .. 

... X 

2 3 

x , x , x , .. 

... x : 


| where 2m = n if n is even and 2m =n — 1 if n is odd. 

-I 


11 . 





























13. 


a. 


1 0 0 
0 0 0 
0 0 0 



o 

o 


o 

o 


o 

o 

o 


o 

o 

o 



1 0 0 

, 

0 0 0 


0 1 0 

, 

0 0 1 



o 

o 

o 


o 

o 


o 

o 

o 


o 

o 



0 0 0 
0 0 0 
0 0 1 


0 1 o' 


o 

o 


o 

o 

o 

1 

10 0 

, 

0 0 0 

, 

0 0 1 

l 

o 

o 

o 


o 

o 

7 


o 

o 

J 


Possible ranks are 2, 1, and 0. 

Exercise Set 5.1 

1. 5 

a. A 2 — 2A — 3 = 0 
b A 2 — 8A+ 16 = 0 

c. A 2 -12 = 0 

d. A 2 + 3 = 0 

e. A 2 = 0 

f. A 2 — 2A + 1 = 0 


Basis for eigenspace corresponding to A = 3: 


11 . 


13. 


15. 


b. 


Basis for eigenspace corresponding to A = 4: 


Basis for eigenspace corresponding to A = /l 2 : 


There are no eigenspaces. 

Basis for eigenspace corresponding to A = 0: 

f 

Basis for eigenspace corresponding to A = 1: 

a. 1,2,3 

b. —\[2, 0, \[2 

c. -8 

d 2 
e. 2 

1 “4, 3 


a. A 4 + A 3 — 3A 2 — A + 2 = 0 


b. A 4 — 8A 3 + 19A 2 — 24A + 48 = 0 


A= 1: basis 


b. 


A = 4:basis 



~o" 


0 

’ 

0 


1 


A = — 2:basis 


-1 

0 

1 

0 


; basis for eigenspace corresponding to A = — 1: 


tf\2 

1 


; basis for eigenspace corresponding to A = — 


3 

{n 

1 


T 


'o' 

_ 0 _ 

9 

_ 1 _ 

T 


o' 

_ 0 _ 

7 

_ 1 _ 


; A= — l:basis 


-2 

1 

1 

0 


2 y = 512 


(if—L 

UJ 512 
a y = x and y = 2x 
No lines 
c y = 0 


True/False 5.1 

False 

False 

True 








































(d) False 
True 

(f) False 

(g) False 

Exercise Set 5.2 

Possible reason: Determinants are different. 
Possible reason: Ranks are different. 

5. A = 0:1 or 2; A= 1:1; A = 2:l,2, or 3 

Not diagonalizable 
Not diagonalizable 
Not diagonalizable 


13. 

l 1 ol 



1 r 1 


0 " 


P = 

3 


P' 






_ 1 1 



n 

L° 


— 1 


15. 

_ -2 0 

r 



■3 

0 l 

P = 

0 1 

0 

1 

% 


0 

3 l 


1 C 

) 

0 



0 

0 : 

17. 

~1 2 

f 



"1 

0 

o' 

P = 

1 3 

3 

j 

P-'AP = 

0 

1 2 

0 


1 3 

4 



0 

1 0 

3 


19. 

0 

0 


O 

O 

O 

P = 

1 

1 

00 0 

0 —*■ 

. 1-1 0 

; P~ l AP = 

0 0 0 

0 0 1 


21 . 


P = 


"1 

0 

0 

o' 


-2 

0 

0 

0 " 

0 

1 

1 

-1 

, P~ l AP = 

0 

-2 

0 

0 

0 

0 

1 

0 

0 

0 

3 

0 

0 

0 

0 

1 


0 

0 

0 

3 


23. 


-1 10237 -2047 
0 1 0 
0 10245 -2048 


25. 


A” = PD”P~ l = 


"1 

1 

f 

'r 

0 

0 " 

2 

0 -1 

0 

3 ” 

0 

1 

-1 

1 

0 

0 

4 ” 


1 

3 

0 - 

1 

'3 


27, 


On possibility is P = 


where Ai and A 2 are as in Exercise 20 of Section 5.1. 


-b -b 
a — Ai a — A 2 

a . A = 1 : dimension =1; A = 3: dimension <2; A = 4 : dimension <3 
Dimensions will be exactly 1, 2, and 3. 
c. A = 4 

True/False 5.2 

(a) True 
True 

(c) True 

(d) False 

(e) True 
True 

(g) True 
True 

Exercise Set 5.3 

1. u=(2 + i, -Ai, 1-0, Re (u) = (2, 0, 1), Im(u) = (-1,4, 1), ||u|| = ^23 
5 . x = (7 — 6z, — 4 — 8z, 6 — 12 j) 


7. 


5 i 4 

2 + i 1 — 5i 


Re (A ): 


0 4 
2 1 


Im(^): 


-5 

-1 


det(.d) = 17 — i, tr(^4) = 1 


l u • v = — 1+i, u ■ w= 18 — 7z, v ■ w= 12+ 6z 
13. —11 — 




































Ai = 2 — i, xi = 

Ai =4 — i, xi = 
9. |A|=/2, * = f 


21 . |A| = 

2, 0= - 

w 

3 


W 

II 

T— O ' 

1 

CM CM 

1 

_1 L 

, C = 

'3 —2 
_2 3_ 

25. p = 

27. a . 

b. 

1 -1 
-l o_ 

h- 8 - 
k---> 

None 

, C = 

■ 

5 —3 

.3 


2 — i 

1 

; A 2 = 2 + xi = 

2 =H 

1 

"l -i" 
1 

; A 2 = 4 + j, xi = 

'i+r 

1 


True/False 5.3 

(a False 
True 
(c False 
True 
(e False 
(f) False 

Exercise Set 5.4 

a * y\ = c\e^ x — 2 c2& x 
y2 =c\e 5x + C 2 e~ x 

b. y\ = 0 
^2 = 0 

a - yi= —C 2 & 2x +C3& 3x 
y2 — c\e X + 2 c 2 & 2x — c 3 e 3x 
yz — 2 c 2 e 2x — c 3 e 3x 
b. yi=e 2x -2e 3x 
y 2 = e x — 2e 2x + 2e 3x 
y 3 = -2e 2x + 2e 3x 

7. y = c\e 3x =hC2&~ 2x 
9. y — c\e x 4= C2<3 2x + c 3 e 3x 

True/False 5.4 

(a False 
False 
(c True 
True 
(e False 

Chapter 5 Supplementary Exercises 

The transformation rotates vectors through the angle 0 ; therefore, if 0 < 9 < then no nonzero vector is transformed into a vector in the same or opposite 
direction. 


3. 


(c) 


1 1 0 
0 2 1 
0 0 3 


15 30~ 

3 = p 1501 

. A 4 = 

'375 750' 

. A 5 = 

'1875 3750' 

. 5 10 . 

L 25 50 


125 250_ 


_ 625 1250 


11 0, tr(4) 


They are all 0. 

is. r i o o 






















They are all 0, 1, or _ 1 . 

Exercise Set 6.1 

a. 5 

b. 

c. -3 

d. /l3 

e. {5 

f. /89 

a. 2 

b. 11 

c. -13 

d. -8 

e. 0 

5- a. -5 

b. 1 

c. -7 

d. 1 

e. 1 

f. 1 

a. 3 

b. 56 

29 

{3 0 
0 {5 

b. [2 0 ■ 

0 {1 

a. ^74 

0 


a. /l05 

b. ^47 

7 . (p, q} = 50 , IIpII = 6^3 

a. 3 f2 

b. 3/5 

c. 3 /l 3 


b. 


A? 



X 



For^=|^ ^ J , then [V, V) = — 2 < 0, so Axiom 4 fails. 










29. 


l _28 

15 

b. 0 

True/False 6.1 

True 
False 
True 
d) True 
False 
f) True 
False 

Exercise Set 6.2 



b _3_ 

/73 


c. 0 

d _20_ 

9/To 

e. _L 

f2 

f. _2_ 

3 - a. 19 

10/7 

b. 0 

7. No 

9. a . fc=-3 

b. * = -2, — 3 

13. No 

a. x = t, y = -2t, z= -3t 

b. 2x — 4- Az = 0 

c. * —z = 0 

a. The line y — — x 

b. The xz-plane 
The x-axis 

True/False 6.2 

(a) False 
True 
True 
True 
False 
(f) False 

Exercise Set 6.3 

1 . (a), (b), (d) 

3. (b), (d) 

5. (a) 



[kkh\ (' kk°\ ikk'fc) 



9. 


11 . 


13. 


a - -jVi + JV2 4* 2 v3 
—yvi -|v 2 + 4v 3 

7 1 5 

C. -|vj - yV 2 + yV 3 

h) u= - ^V! - j^v 2 + 0v 3 + lv 4 

a. W= M U[ _| U2 _I U3 


b * w = —=117 + 


11 


/6 


u 3 


15. 

a. 

(2 5 _ 

2 _21 



U’ 4’ 

4' 


b. 

/17 7 

1 23 ^ 



U2’ 4' ' 

12 ’ 12 J 

17. 

a. 

/ 23 11 

1 17 3 



\1S' 6 ’ 

18’ 18J 


b. 

/3 3 

i n 



U’ 2 ’ 

2 ' 2) 

19. 

a. 

—& 

i. -i. -i)..,- 


b. 

P 

5 3 5\ 


W1 = (?■ 

4 »■ “ 4 - ■ “ 4 } w 2 = 

21 . 

a. 

I i 

3 \ / 



V H^'W V2 “^ 





* 


b. VI = (1,0), v 2 = (0, -1) 


23. 


vi 


= fo, A. X o\ v 2 = Lx. - 1 2 o| 

^ /? /? J [\[30 /30 /30 J 


v 3 = 


/To’ /To’ 


2 

/To’ 


2 

/To 


v 4 = 


_j_L_ 

/T?’ /T?’ 


25. 


_2_3_ 

/T?’ /T? 




27 W1 = fl3 31 40.1 M _ J_ _2J 
1 U4' 14’ 14/ 2 U4’ 14 ’ 14 J 


29 . 


1 _ 2 _ 

f5 ~f5 

2 1 

f5 f5 


f5 f5 
0 {5 












J__i_ 

f2 

0 ~T 

l/3 

1 1 

f2 fl 


1 

3 

_2 

3 

2 

3 


8 


^234 

11 

/234 

7 

/234 


/2 3/2 
0 /3 


1 

3 

l/26 

3 


d. 


J__1_ 1 

{3 {l 

0 -L -2_ 

/6 


J_ J_ 

f2 f3 


{l {l {2 

. fi- L 

0 0 

ft 


J_ _/L 


I 

7* 

1 

0 


2/19 

__^L 

2 / 1 ? 

3|/2 

/l9 


_3_ 

/l9 

1_ 

3 

3^ 

/l9 

^ /l9 

1 

0 0 — 

/l9 

/l9 


Columns not linearly independent 
3. vi = 1, V 2 = /3(2x — 1), V 3 = ^5(6t: 2 — 67 : + 1) 

True/False 6.3 

(a) False 
False 
True 
(d) True 
False 
True 

Exercise Set 6.4 

X. 


a. 

'21 

25' 

*r 


~20~ 


25 

35 

/ 2 . 


20 


b. 

' 15 

-1 

5' 

■*r 


-l" 


-1 

22 

30 

*2 

= 

9 


5 

30 

45 

x 3 


13 


a * x\ = 5, x 2 = 2 

b. *i = 12, *2 = “3, 7:3 = 9 


3 

2 

9 

2 

=3 


b. 


3 

-3 

0 

3 


Solution: 


n: x = j; least squares error: y ^5 

Solution: x = 0 j +1 (—3, 1) (t a real number); least squares error: y ^42 


































9. 


11 . 


c ' Solution: x = 0 j 4= £( — 1, — 1,1) (t a real number); least squares error: -^294 

a. (7, 2, 9, 5) 

b. /_ 12 _ 4 J_2_ 16A 

l 5’ 5' 5’ 5) 

det (A A) = 0; A does not have linearly independent column vectors, 
det (A^A) = 0; A does not have linearly independent column vectors. 


'1 0 0 
[P] = 0 0 0 
0 0 1 

b. [0 0 0 

[P]= 0 1 0 
0 0 1 


15. 


a. (1,0, 




5), (0,1,3) 

10 15 -5 
15 26 3 

-5 3 34 


c. / 2xq 1 3y 0 -zq 15xq I 26yp I 3 z 0 -5x 0 t 3y 0 I 34z 0 \ 

^ 7 35 35 j 

d. 3^35 

7 


17. s = l = l 

[/>] =A T {AA T )~ l A 


True/False 6.4 

True 
False 
True 
d) True 
False 
True 
False 
h) True 


Exercise Set 6.5 


1. 

3. 


y ”2 + ? 
y = 2 + 5x - 3x 2 


11 . 


y = 


5_,AS 
21 ^lx 



True/False 6.5 

False 

True 

False 

True 


Exercise Set 6.6 

a . (1 4= it) — 2 sin x — sin 2x 

b • (l+»)-2^: 


„ . sin 2x , sin 3x . 
sin x t —-— I —-— I . 


vanx 1 

* J 


b. n+ 
12 + 


1 ±_e 

2(1 -e) 


i i i i i i i i i i A 

10 


3 . 

















5. 



’■KlM 1 -'- 0 *]-*’ 

True/False 6.6 

False 
True 
(c True 
d) False 
(e True 


Chapter 6 Supplementary Exercises 
a (0, a , a, 0) with a ± Q 

b - ■ _2_ _L 01 

f5' F 


± 0, 


The subspace of all matrices in Mji with only zeros on the diagonal. 
The subspace of all skew-symmetric matrices in M^. 



K No 

0 approaches 

17. No 


Exercise Set 7.1 


1. 



3. 


(a) 


(b) 


1 0 
0 1 
1 

f2 

1 

~f2 


(d) 


f2 

ft 

f3 


(e) 


1 1 

2 2 

1 

2 6 

1 1 

2 6 

1 1 

2 6 


9 

12 

25 

25 

4 

3 

5 

5 

12 

16 

25 

25 


f2 

f2 


0 

_ 2 _ 

ft 

1 

2 

1 

6 

1 

6 

5 

6 


f2 

ft 

fl 


1 

2 

1 

6 

5 

6 

1 

6 


a. (— l + 3/3,3+/3) 

- (f-^5. 


9 . 











11 . 


A = 


A = 


■ 

3 

2 

cos 9 0 

—sin 9 

0 1 

0 

sin0 0 

cos 9 

1 0 

0 

0 cos 9 

sin 9 

0 —sin 6 

cos 9 


13. a 2 + b 2 = l- 

2 

The only possibilities are a = ^ = j^r> c = "Jjfj 0 r a = ^ = c = 


vY 


Rotations about the origin, reflections about any line through the origin, and any combination of these 

Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 

No; dilations and contractions 


True/False 7.1 

False 

False 

False 

False 

True 

f) True 

g) True 

h) True 

Exercise Set 7.2 


a. A 2 — 5A = 0: A = 0: one-dimensional; A = 5: one-dimensional 

b. A 3 — 27A — 54 = 0: A = 6: one-dimensional; A = — 3: two-dimensional 

c. A 3 — 3A 2 = 0: A = 3: one-dimensional; A = 0: two-dimensional 

d. A 3 — 12A 2 H= 36A — 32 = 0; A = 2: two-dimensional; A = 8: one-dimensional 
A 4 — 8A 3 = 0: A = 0: three-dimensional; A = 8: one-dimensional 

f. A 4 — 8A 3 + 22A 2 — 24A + 9 = 0; A = 1: two-dimensional; A = 3; two-dimensional 


3. 


P = 


__ 2 _ £L 

~fi fi 

IL _ 2 _ 

fl fl 


P~ l AP = 


3 0 
0 10 


5. 



3 
5 
0 

4 

5 


P~ l AP = 


25 

0 

0 


0 0 
-3 0 

0 “50 


f 

1 

F = f 

f 


f 

f2 

0 




P~ l AP = 


0 0 0 
0 3 0 
0 0 3 


P = 


0 0 


i 5 oo 


0 0 - 


oo i i 


; P~ X AP~- 


-25 0 
0 25 


0 0 -25 0 
0 0 0 25 


5. No 
9. Yes 

True/False 7.2 



















True 
True 
False 
True 
e) True 
False 
True 


Exercise Set 7.3 

1. 


[*1 *2] 


b. 


[*1 *2] 


3 oir*i 

0 7 *2 


!]P 

[- 3 :»] 


[*1 *2*3] 


9 
3 

-4 1 


n] 

3 -4 

-1 

4 


3 2x 2 + 5y 2 — 6xy 


1 1 

{l {l 

1 1 

> /2 

1 2 
■3 3 
2 1 
3 3 

1 2 

3 3 


[£} e=3^+^ 2 


Q=y\ +*y\+iyl 


[*?] 


b. 


[*?] 


2 0 


0 0 
0 


!] 


;] + [7 _8]p]-5 = 0 


11 . 


ellipse 
hyperbola 
c parabola 
circle 


3. Hyperbola: 2 (ytj 1 - 3(x/) 2 = 8, 0ss - 26 . 6° 
Hyperbola: 4 (i/) 2 -0/) 2 = 3; 0 = 36.9° 


Positive definite 
Negative definite 
Indefinite 

Positive semidefinite 
Negative semidefinite 


Positive definite 
Positive semidefinite 
Indefinite 
27. * >2 


1 1 ... 1 

i4= n(n- 1) » »(»-l) 

: : ! 

1 1 ... 1 

»(»-!) »(»-!) « 


Yes 




























A must have a positive eigenvalue of multiplicity 2. 

True/False 7.3 

True 
False 
True 
True 
False 
True 
True 
True 
False 
True 
(k) False 
False 


Exercise Set 7.4 

1 Maximum: 5 at (1, 0) and ( — 1, 0); minimum: _] at (0, 1) and (0, — 1) 

Maximum: 7 at (0, 1) and (0, -1); minimum: 3 at (1, 0) and (-1, 0) 

Maximum: 9 at (1, 0, 0) and (-1,0, 0); minimum: 3 at (0, 0, 1) and (0, 0, -1) 

7 Maximum: z = 4 ^2 at (x, j ) = ^2 ^2, 2 J and ^ — 2 y/ 2 , — 2 J; minimum: z — — 4 ^2 at (x, 7) = ^ — 2 \[2, 2 J and ^2 \[2, — 2 



Critical points: (-1, 1), relative maximum; (0, 0), saddle point 
Critical points: (0, 0), relative minimum; (2, 1) and (-2, 1), saddle points 


Corner points: x 
21 <7 00= A 



f2 


True/False 7.4 

False 

True 

True 

False 

True 


Exercise Set 7.5 

" -2 i 4 5- 

1+i 3-x 0 


'■ /*= 


1 


1 i 2-3 i 
A= -1 -3 1 

2 + 3i 1 2 

a. “13**31 

b . “22 * “22 


9. 


A = A~ L = 


11 . 


4 

—i + |/3 

2/2 

2/2 


1-i|/3 

2/2 

2/2 














13. 


15. 


17. 


P = 



f f • *-[ 

/3 _ 

-1 — i 1+r 

f f • H 




-2 0 0 
0 1 0 
0 0 5 


19. 


0 

i 

2- 


A = 

i 

0 

1 



-2-3 i 

-1 

Ai 

21. 

a. 

<*13* - 

031 



b. 

a\\* - 

au 


29. 

B and C must comn 

37. 


i 




/2 /2 

_l__ L 

f2 ~f2 

Multiplication of x by P corresponds to ||u|| 2 times the orthogonal projection of x onto W = span (u) . If ||u|| = 1, then multiplications of x by // = / _ 2uu* 
corresponds to reflection of x about the hyperplane u 1 • 

True/False 7.5 

(a) False 
False 
True 
(d) False 
False 

Chapter 7 Supplementary Exercises 


1. 


-l 


3 1 
5 5 

1 2 

"5 5 


4 

0 


3" 

-1 


4 


9 

12' 

5 


5 



5 


25 

25 

9 

4 

12 



0 


4 

3 

25 

5 

25 




5 

5 

12 

3 

16 



3 


12 

16 

25 

5 

25 



■5 


25 

25 

1 

~f2 

1 

f2 

0 




'0 

0 

O' 


0 

0 

1 

; P r AP = 


0 

2 

0 


1 

1 

0 




0 

0 

1 


f2 

f2 









P = 


positive definite 
a. parabola 
parabola 

Exercise Set 8.1 

Nonlinear 

Linear 

Linear 


































7. 


a. Linear 
Nonlinear 


9 . T(x h x 2 ) = (-4x l *5x 2 , *i — 3*2); 7(5, - 3) = ( - 35, 14) 

1 T(x\ r X2,x 3 ) = ( -xi +4*2 -* 3 , 5*i -5^2 “* 3 , * 1 + 3 * 3 ); 7(2,4, - 1 ) = (15, -9, - 1) 
13 . 7(2vi - 3 v 2 + 4 v 3 ) = ( - 10, - 7, 6 ) 


(a) 
7. (a) 
(a) 


21 . 


23. 


a. (1. -4) 

b - (1,0,0), (0,1,0), (|, -4,l) 


5 , 6 

Jl I 4 

b. f—14" 

19 

11 


c. Rank(7 T ) = 2, nullity^ = 1 
Rank(^4) = 2, nullity(^4) = 1 


25. 



c. Rank(T) = nullity(7) = 2 
Rank(j4) = nullity (A) = 2 


Kernel: y-axis; range: xz-plane 
Kernel: x-axis; range: yz-plane 

Kernel: the line through the origin perpendicular to the plane y = x; range: plane y = x 


Nullity (T) = 2 
Nullity (T) = 4 
Nullity (T) = 3 

d. Nullity(T) = 1 

a. 3 

No 


A line through the origin, a plane through the origin, the origin only, or all of 

35 - (b) No 


ker(D) consists of all constant polynomials. 

a. JV(z)W«(x) 

b. T(/(*))=/(”+%) 


True/False 8.1 

a) True 
False 
True 
d) False 
True 
True 
False 
False 
False 

Exercise Set 8.2 














a. ker(T) = {0}; T is one-to-one 

ker(T) = | 1 j| ; T is not one-to-one 

c ker(T) = {0}; T is one-to-one 
ker(T) = {0}; T is one-to-one 
e ker(T) = {£(1, 1)} ; T is not one-to-one 
ker(T) = {£(0, 1, — 1)} ; 7 is not one-to-one 

a. Not one-to-one 
Not one-to-one 
c One-to-one 

a. ker(T) = {*( - 1, 1)} 

T is not one-to-one since ker(7 7 ) * {0} . 

T is one-to-one 
T is not one-to-one 
T is not one-to-one 
T is one-to-one 


11 . 


a. 


J a 

b 

c" 


b 

d 

e 

= 

\ c 

e 

/_ 






l\ a 

b 

)- 

b 

I* 

d_ 

r 

c 

A 


a b 
c d 


T{ax 5 + bx 1 + cx) = 


d. 


T{a + b sin(x) + c cos(x)) = 


T is not one-to-one since, for example, / (*) = x 2 (x — l) 2 is in its kernel. 
Yes; it is one-to-one 

T is not one-to-one since, for example a is in its kernel. 

19. Yes 

True/False 8.2 

(a) False 
True 
(c) False 
True 

(e) False 

(f) False 

Exercise Set 8.3 


1 . 


a. (72 0 T l )(x,y) = (2x - 3y, 2x + 3y) 

b. (T 2 cT 1 )(x,y) = (4x-\2y,3x-9y) 

c. (T 2 o T\)(x,y) = (2x + 3y, x-2y) 

d. (T 2 o7 1 )( I j) = (0,2 I ) 

a. a + d 

(T 2 o T\) (^4) does not exist since T \ (A) is not a 2 x 2 matrix. 


5 r 2 (v) = Iv 


11 . 


T has no inverse. 


















b. 

7 7-1 

_ *l' 

*2 

_*3_ 

= 

8*‘ + 8*2"4 X3 
|*l+£*2 + J*3 

-8 x,+ f xa+ 4*3 

c. 

■*r 

*2 

_*3_ 


2* 1_ 2* 2+ 2* 3 

7" 1 

= 

4 X1 + 2 X2+ 2 X3 

^1 +^2“^3 





'*l‘ 


3^1 4= 3^2 — *3 

7 _1 

x 2 

= 

—2x\ — 2x 2 + x 2 


x 3 


—Ax\ — 5 x 2 4= 2x2 


13, a. aj * 0 for i = 1, 2, 3, n 

”• * 2 - *3. *„) = (^*1. ^*2. ^3.^*») 

a - Tf 1 (/>(*)) = ^1; 77 1 (*(*)) = ,(* - 1); (7 2 o 70 (*>(*)) = ±p(x - 1) 

17- (a) 0. - 1) 

(d) 7 _1 (2, 3) = 2 + x 

21* a. 7*1 0 ?2 = ?2 0 7*1 

b. T\oT 2 *T 2 oT\ 

c. TioT 2 = T 2 oTi 


True/False 8.3 

True 
b) False 
False 
True 
False 
0 True 


Exercise Set 8.4 


a. 0 0 0 
1 0 0 
0 1 0 
0 0 1 

a. [1 -1 1 

0 1 -2 
0 0 1 

a. I" 0 0" 


8 4 
3 3 

a. 1 1 1 

0 2 4 
0 0 4 

b. 3 4= 10x 4= 16x 2 


lT<Ji)] B = 


b. 


T(y 1 ) 


44 

)- 


[T(v 2>] B -- 


T(y 2 ) = 


18 I 

7 7 

107 24 

7 7 


c. 
























d. 


19 

7 

83 

' 7 


11 . 


a. 

T 


3" 


-r 

[T(vi)] b = 

2 

, [T(v 2 )]b = 

0 

. [r(v 3 )] B = 

5 


6 


-2 


4 


13. 


b. 7(vi) = 16 + 51x + 19x 2 , T(v 2 ) = - 6 - 5x + 5x 2 , 7(v 3 ) = 7 + 40x + 15x 2 

«• 7( fl0 + fll x + fl2 x 2 )= 239^0-16^ + 289^ 2 + 201«q - 111^ + 247« 2 ^ + 61^-31^ + 107^ , 

d T(l+x 2 ) = 22 + 56x+ 14x 2 


[ T 2 o T\] b > b = 


b. [T2 q T\\b\B = V^2]b\B ,, \T\]b ,, ,B 


~o o' 
6 0 

0 -9 

> IT2]b',S" = 

'0 0 o' 

3 0 0 

0 3 0 

- [7*1 ]b",B = 

1 - 

0 co c 

1 

CM O C 

0 0 


0 0 3 




19. 


b. 


0 0 
0 0 


0 1 0 


0 0 


d. 

'2 1 O' 

4' 


14" 

14<? 2 * - Sxe 2x - 20x 2 e 2x since 

0 2 2 

6 

= 

-8 


0 0 2 

-10 


-20 


21 . 


a, Bt t Bn 

b. B', B w 


True/False 8.4 

a) False 
False 
(c) True 
False 
True 

Exercise Set 8.5 

1 . 

[T] B= l I?]. Wb,= 


3_ _56 
11 11 
_2_ J_ 
'll 11 


3. 


[T] b = 


[T] b = 


[T] b = 


1 _i_" 


13 25 

f2 

f2 


II /2 

ll/2 

1 1 

5 

9 

f2 

f2 


11/2 

11/2 


1 0 0 
0 1 0 
0 0 0 

2 _2 
3 9 

1 1 

2 3 


[*■]» = 


1 0 0 
0 1 1 
0 0 0 


[T)b,-- 


=p 

[o lj 


11 . 


-{[4 m 

- 3-^21 


—3 4° ^2\ 


Bt = 



















































13. 


a. A = -4, A= 3 

b Basis for eigenspace corresponding to A = — 4: —2 + x 4 = x^', basis for eigenspace corresponding toA = 3: 5 — 2x =F x 2 


The choice of an appropriate basis can yield a better understanding of the linear operator. 

True/False 8.5 

(a) False 
True 
True 
True 
True 
(f) False 
True 
(h) False 

Chapter 8 Supplementary Exercises 

1 . No. T(xi +X 2 ) = Zl(xi +X 2 ) +5* (Zbq -fZ?) 4= (Ax 2 + Z?) = T(xi) + T(x 2 ), and if c * 1, then T(cx) = cAx±B*c(Ax±B) =cT(x) . 

T(e 3 ) and any two of T(ei), 7 7 (e 2 ), and T(e 4 ) form bases for the range; ( — 1 , 1 , 0 , 1 ) is a basis for the kernel. 

\y Rank = 3, nullity = 1 

7. a Rank(T) = 2 and nuUity(T) = 2 
T is not one-to-one. 
ip Rank = 3, nullity = 1 


13. 


10 0 0 
0 0 10 
0 10 0 
0 0 0 1 


15. 



'-4 

0 


IT)b, 

= 

1 

0 




0 

1 

17. 



1 - 

■1 


[T] b -- 


0 

1 




1 

0 

19. 

(b) 

II 

X, 


(c) 

II 

2 

V\ 

21. 

(d) 

The points ; 

25. 

0 0 

0 ■ 



1 0 0 
0 1 0 


0 0 — 


0 0 0 


tf + 1 


Exercise Set 9.1 
1 . x\ =2, *2=1 

3. *1=3, *2 = -1 

5. *1= -1. X 2 = l. *3 = 0 
7. *1 = — 1, X2 = 1, 7:3 = 0 
9 . xi = —3, 7:2 = 1 , 7:3 = 2 , 7 : 4=1 

11. n 


A = LU = 


2 0 0 

-2 1 0 

2 0 1 


1 I 
0 0 
0 0 


b. 


A = L { DU { = 









1 

2 

r 

p 

1 

0 

o' 

"2 

0 

o' 

1 

-1 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

1 

_0 

0 

1 





















c. 

1 

0 

0" 

'2 

1 

-l" 

a = l 2 u 2 = 

-1 

1 

0 

0 

0 

1 


1 

0 

1 

0 

0 

1 


13. 

"l 

0 

0~ 

'3 

0 

O' 

'1 

-4 

2' 

A = 

0 

1 

0 

0 

2 

0 

0 

1 

0 


2 

-2 

1 

0 

0 

1 

0 

0 

1 


15. x , = 2l _M „_i2 

1 17' 2 17' 3 17 


17. 







1 

_I ol 



'1 

0 

0" 

'3 

0 

O' 

3 


A = 

0 

0 

1 

0 

2 

0 

0 

1 I 

; 


0 

1 

0_ 

3 

0 

1 


2 









0 

0 1 


19 - (b> i 


b 


'1 

o' 

a 

b 

- 


1 

c 

d_ 


£ 

a 

1 

0 

ad — be 

a 


*1= *2 = -^. *3 = '- 


True/False 9.1 

(a) False 
False 
True 
d) True 
True 

Exercise Set 9.2 

a. A 3 dominant 

No dominant eigenvalue 

3. 


[ 0.98058' 


0.98837' 


0.98679" 

■ 

|_ —0.19612_ 

; x 2 w 

_ —0.15206 _ 

; x 3 « 

_ —0.16201 _ 

; x 4 w 


dominant eigenvalue: A = 2 * /lJD « 5 . 16228; 
dominant eigenvector: 


1 1 

1 

ll 

£ 

I 

CO 

_1 

—0.16228 


*. = [/]’ aC 1) = 6 ;x 2 = [ J 

X4*[" a5 1 3488 ]. 6.60555; 

dominant eigenvalue: A = 3 4= /T3 « 6.60555; 


A^ = 6 . 6 ; X 3 ? 


-0.53846 

1 


dominant eigenvector: 


/26*4/T3 
2 t /l3 


-0.47186 

0.88167 


9. 

13. 


/26 + 4/13 

X2 = [-a8]' X3RJ [-o 1 929] 
b. A® = 2.8, A® « 2.976, A® « 2.997 

Dominant eigenvalue: A = 3; dominant eigenvector: 
0 . 1 % 

2.99993 


’[-!] 


0.99180 

1.00000 


Starting with 


Starting with 


, it takes 8 iterations. 


, it takes 8 iterations. 


0.987151 
■0.15977 J’ 


, A® ss 6.60550; 


Exercise Set 9.3 




















































1. 

T 


"2" 

h 0 = 

2 

, a 0 = 

0 


2 


3 


3 . 

0.39057 


"0.60971" 

h iss 

0.65094 

, ai ps 

0 


0.65094 


0.79262 


Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 
Site 2, site 3, site 4; sites 1 and 5 are irrelevant 
Exercise Set 9.4 

1* a. ^ 0.067 second 

b. ps 66.68 seconds 

c. ps 66, 668 seconds, or about 18.5 hours 

9.52 seconds 

b. ps 0.0014 second 

c . ps 9.52 seconds 

d. ps 28.6 seconds 

6.67 x 10 5 s f° r forward phase, 10 s for backward phase 
1334 

7. « 2 flops 
9. 2« 3 — « 2 flops 


Exercise Set 9.5 

1 . 0 , 

3. ^5 


A = 


A = 


A = 


1 

1 

_ 2 _ 

& 

1 

2 

3 

1 

3 

_2 

3 


_l_ 

1 

_l_ 

2 

1 


{2 0 

C 1 O' 

0 /2 

[0 lj 


8 0 
0 2 


£ 

6 


0 

_L Ji 
{2 6 


11 . 


A = 


_1_ 0 


2 

fe 

j_ j__i_ 

^ 3/2 

1/3/2 


True/False 9.5 

False 
b) True 
False 
False 
True 
False 
g) True 

Exercise Set 9.6 


J_2_ 

f5 f5 
2 1 
'/5 /5 




1 

1 

[3/2 

0 

~f2 

f2 

0 

0 

1 

1 

0 

0_ 

f2 

f2 


<[3 0 

0 j/2 

0 0 


1 0 
0 1 






























1. 


[3/5] 


1 1 
'f2 f2 


4= 0 
ft 

1 1 

& f2 

<[3 {2 


3f2 




2 

3 

I 

3 

_2 

3 

J_ 

f3 

1 

/3 

‘/3 


\{3 0 

ri o' 

0 {2 

[0 ij 

J_ J_" 


72 \[2 



[1 0 ] + /2 


0 

J_ 

f2 

1 


[0 1] 


70,100 numbers must be stored; A has 100,000 entries 

True/False 9.6 

True 
True 
(c) False 

Chapter 9 Supplementary Exercises 


1. 


2 0 

—2 1 

2 0 0 

1 2 0 

1 1 2 


-3 1 
0 2 
1 2 3 
0 1 2 
0 0 1 


A= 3, v = 


f2 

f2 


x 5 t 

X 5 f 


0.7100 

0.7041 

1 

0.9918 


[0.7071 
5 [o.7071 


_3— 0 JL 

f2 f2 

0 1 0 

-L 0 -4= 


f2 


2 0 
0 0 
0 0 


11. 





1 

1 







2 

2 




12 

0 

6 ' 


1 

1 




4 

-8 

10 


2 

2 

"24 

0 1 


4 

-8 

10 


1 

1 

_ 0 

12j 


12 

0 

6 


2 

2 








1 

1 







2 

2 




1 _ ]_ 

{2 {2 

'f2 f2 


Exercise Set 10.1 

a. y = 3x —4 

b. y = =2x-fl 












































2 . 


a. x 2 + y 2 _ Ax - 6y + 4 = 0 or (* - 2) 2 + O - 3) 2 = 9 

b. x 2 +.y 2 + 2x - Ay - 20 = 0 or (* + l) 2 + (y - 2) 2 = 25 

3 x 2 4= 27:7 + .y 2 — 27 :+ i y = 0 (a parabola) 


4. 


a x + 2 i y+z = 0 
b — 7:4=7 — 2z + 1 = 0 


x y z 0 
*1 71 *\ 1 
*2 72 z 2 1 
7:3 73 z 3 1 


= 0 


b 7T + 2.y+z = 0 ;-t :+7 = 2z = 0 

a. t: 2 +y 2 +z 2 — 2 t: — Ay — 2z= — 2 or (x — l) 2 4 s (y — 2) 2 4= (z— l) 2 = 4 
b * 2 +7 2 4=z 2 - 2x - 2y = 3 or (* - l) 2 + (y - l) 2 4=z 2 = 5 


10 . 


= 0 


y x z x \ 

71 *1 1 

72 *2 *2 1 

73 *3 *3 1 

The equation of the line through the three collinear points 

12. 0 = 0 

The equation of the plane through the four coplanar points 

Exercise Set 10.2 

1- x\ = 2, 7:2 = y; maximum value of z = yy 

No feasible solutions 
Unbounded solution 

Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 

"7 0*S ooc 

■j- cup of milk, ounces of corn flakes; minimum cost = = 18.6& 

9 18 18 

x\ > 0 and 7:2 > 0 are nonbinding; 2 t: 1 4= 3t: 2 < 24 is binding 

b. 7T2<vforv< — 3 is binding and for v < =6 yields the empty set. 

7:2 < v for v < 8 is nonbinding and for v < 0 yields the empty set. 

550 containers from company A and 300 containers from company B; maximum shipping charges = $2110 

925 containers from company A and no containers from company B; maximum shipping charges = $2312.50 

0.4 pound of ingredient A and 2.4 pounds of ingredient B\ minimum cost = 24.8& 

Exercise Set 10.3 

1 700 


2 . 


a 5 
b 4 

Ox, units; sheep, unit 

First kind, measure; second kind, measure; third kind, measure 


' 25 


25 


' 25 


7T1 : 


( a 2 + fl 3 +- + «>,)-<»1 ; Xj = a . _ X1 j = 2 _ 3 .„ 


n — 2 


Exercise 7(b); gold, 30-y minae; brass, 9y minae; tin, 14-!- minae; iron, 5y minae 

a 5x +y +z-K = 0 
x+ly+z-K = 0 
x+yA-Sz-K = 0 


x = yjy, y = yyp z = y|y, K = t where t is an arbitrary number 

Take t = 131, so that x = 21, y = 14, z= 12, £” = 131- 
c. Take f = 262, so that x = 42 , y = 28, z = 24, K = 262- 







7 9 

Legitimate son, 577staters; illegitimate son, 422staters 

Gold, 30-^- minae; brass, 9-^- minae; tin, 14-^- minae; iron, 5-^- minae 

First person, 45; second person, 37-1; third person, 22-^- 

Exercise Set 10.4 

a. S(x) = —.12643(x — .4) 3 — .20211 (x — 4) 2 4- 92158(x — .4) + .38942 

b . S(. 5) = 47943; error = 0% 


The cubic runout spline 
b S(x) = 3x 3 - 2x 2 + 5x + 1 

f 


S(x) = { 


- . 00000042(x 4-10) 3 


4= 

000214(x 4= 10) 

4= 

.99815, 

— 10 <x < 0 

00000024(x) 3 

,0000126(x) 2 

4= 

,000088(x) 

4= 

.99987, 

0 <x < 10 

- 00000004(x — 10) 3 - 

. 0000054(x - 10) 2 

- 

000092(x — 10) 

+ 

.99973, 

10 <x < 20 

00000022 (x — 20) 3 - 

0000066(x — 20) 2 

- 

.000212(^-20) 

4= 

.99823, 

20 <x < 30 


Maximum at ( x , S(x)) = (3.93, 1.00004) 


5. 

,00000009(x +10) 3 - 

0000121(x + 10) 2 

4= 

.000282(x 4= 10) 

4= 

.99815, 

— 10 <x < 0 

S(x) = { 

00000009(x) 3 

0000093(x) 2 

4- 

.00007000 

+ 

.99987, 

0 <x < 10 

. 00000004(x — 10) 3 - 

0000066(x — 10) 2 

- 

,000087(x — 10) 

+ 

.99973, 

10 <x < 20 


00000004(x — 20) 3 - 

0000053(x — 20) 2 

- 

.000207 (x — 20) 

4= 

.99823, 

20<x<30 


Maximum at ( x , S(x)) = (4.00, 1.00001) 


«*) = 


-4x J 4= 3x 


0<x <0.5 


4x 3 — 12x 2 + 9x — 1 0.5<x<l 


b. . J2-2x 0 5<x<l 

S(x) = \2-2x 1 <x < 1.5 

The three data points are collinear. 


(b) r 


4 

1 

0 

0 • ■ 

■ • 0 

0 

0 

l" 

’ M i 


y n - i 

2/1 

4= 

72 


1 

4 

1 

0 ■ ■ 

■ ■ 0 

0 

0 

0 

m 2 


7 i 

2/2 

+ 

73 


0 

1 

4 

1 • ■ 

■ ■ 0 

0 

0 

0 

m 3 

6 

72 

2/3 

4= 

74 











h 2 






0 

0 

0 

0 ■ ' 

■ • 0 

1 

4 

1 

Mn- 2 


y n - 3 

- 2/„_ 2 

4- 

7m—1 


1 

0 

0 

0 ■ ' 

■ • 0 

0 

1 

4 

M n - 1 


y n - 2 

- 2/„_i 

4= 

71 


'2 

1 

0 

0 ■ ' 

■ ■ 0 

0 

0 

f 

’ Mi 


- 

hy\ - 


71 4 

72 

1 

4 

1 

0 ■ ' 

■ • 0 

0 

0 

0 

m 2 



y\ - 


272 4 

73 

0 

1 

4 

1 • ■ 

■ • 0 

0 

0 

0 

m 2 

6 


/2 - 


273 + 

74 










h 2 






0 

0 

0 

0 • ' 

■ • 0 

0 

4 

1 

M n - 1 



yn- 2 - 

27 

P M-1 ■+ 

7m 

0 

0 

0 

0 • • 

■ ■ 0 

1 

1 

2 

M n 



y n - 1 - 


7m 4 

hy„ 


Exercise Set 10.5 

1 . o 


'.4" 

. x® = 

".46" 

( 3r ) _r.454" 

, x^ = 

'.4546' 

— [454541 

_. 6 _ 


_.54_ 

[.546 


_.5454_ 

' ‘ [.54546 


P is regular since all entries of P are positive; q = 


a. 

.7" 


'.23' 


'.273' 

x® = 

.2 

. x© = 

.52 

, x® = 

.396 


.1 


.25 


.331 


































b. 


P is regular, since all entries of P are positive: q = 


22 

72 

29 

72 

21 

72 


3- a. _9_ 

17 

_ 8 _ 

17 

b. [26 

45 

19 

45 

c. J_ 
19 

19 

12 

19 


P” = 


P n - 


■-(«” 

0 0 ] 

1 lj 


, « = 1, 2, — Thus, no integer power of P has all positive entries. 


as n increases, so 




for any X (P) as n increases. 


The entries of the limiting vector 


are not all positive. 


6. 

"i i r 


1 


2 4 4 


3 

P 2 = 

1 1 1 

4 2 4 

has all positive entries; q = 

1 

3 


1 1 1 


1 


4 4 2 


3 


7 IQ 
13 

54~% in region 1,1 6^% in region 2, and 29^% in region 3 
6 3 6 

Exercise Set 10.6 


a. 0 0 0 1 

10 11 
110 1 
0 0 0 0 

b. Toiioo’ 

0 0 0 0 1 

10 0 10 

0 0 10 0 

0 0 10 0 

c. [o 1 0 1 0 o' 

1 0 0 0 0 0 

0 10 111 

0 0 0 0 0 1 

0 0 0 0 0 1 

0 0 10 10 





















P* P S 

3. » r x 


b. 1- 

- step: 

^1 

- Pi 



2- 

- step: 

Pi 

- P 4 

- Pi 




Pi 

^p 3 

- Pi 


3- 

- step: 

Pi 

- Pi 

- Pi 

- Pi 



Pi 

- Pi 

^P 4 

- Pi 



Pi 

- P 4 

- Pi 

~>Pl 

c. 1 “ 

- step: 

Pi 

—*P4 



2- 

- step: 

Pi 

- Pi 

* P4 


3- 

- step: 

Pi 

- Pi 

* Pi 

-fil 



Pi 

- P 4 

* Pi 

-^4 


(a) 1 0 0 0 0 

0 10 0 0 

0 0 110 
0 0 12 1 

0 0 0 1 2 

The i yth entry is the number of family members who influence both the z'th and jth family members. 
5- a. {Pl.P2.P3) 

b. {P 3 .P 4 .P 5 ) 

c - {Pi, P4. P&, P%) and {P4. P 5 . Pi) 

a. None 

b. {P 2 .P 4 .P 6 ) 

Power of P\ = 5 
Power of P 2 = 3 
Power of P 2 = 4 
Power of P 4 = 2 

8. First, A; second, B and E (tie); fourth, C; fifth, D 
Exercise Set 10.7 
1. a . -5/8 

b. [0 1 0] 

c. [1 0 0 0] T 


0 0 11 
10 0 0 
0 10 1 
0 10 0 


Let A - 


-[i:] 

[0 1], q 


for example. 


[0 1 0], q 


p = [0 0 1], q 


v = 2 













p =[0100], q = 


* [5 31 * 

p = [s n q = 

b. 

* r2 ii 

P =[33} q = 

C ' P*=H 0], q* = 

-•=[?!} 

e. 

* Tj_ 101 

p [l3 13 J’ 

5. 

* m xi * 

p [20 20 J' q 

Exercise Set 10.8 


, v= — 2 


. 27 


70 

' 3 


v = 3 


19 

5 


q = 


29 
' 13 


20 


1. 


Use Corollary 10.8.4; all row sums are less than one. 
Use Corollary 10.8.5; all column sums are less than one. 


c. 

"2" 


1.9 

Use Theorem 10.8.3, with x = 

1 

>Cx = 

.9 


1 


.9 


E 2 has all positive entries. 

Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 
5 $1256 for the CE, $1448 for the EE, $1556 for the ME 

6. (b) 542. 

503 

Exercise Set 10.9 

The second class; $15,000 

2 $223 

3 1:1.90:3.02:4.24:5.00 

5 ^ / (g\ 1 + g2 _1 + ' ' ’ + Sft-1) 

6 1:2:3: • • -:«-l 

Exercise Set 10.10 

!• - To 1 1 0 _ 

0 0 11 

0 0 0 0 

oil 

2 2 


0 0 


1 1 
2 2 
0 0 0 0 

























'-2 -1 

-1 -2" 




-1 -1 

0 0 




3 3 

3 3 



d. 

0 .866 

1.366 .500" 



0 -.500 

.366 .866 



0 0 

0 

0 


(b) 






(0,0,0), (1,0,0), 

(1.1.0) 

(c) 

(0,0,0), (1, .6, 0), 

(1, 1.6,0), (0, 1,0) 

a. 

"l 0 0" 





0-10 

0 0 1 





-1 0 o' 
0 1 0 
0 0 1 


1 0 o' 

0 1 0 

0 0-1 


4. 


5. 


M i = 


o 

o 


' 1 1 

r 

r 

0 2 0 

0 0 i 

, M 2 = 

2 2 

0 0 • ■ 

2 

• • 0 

, m 3 = 


o 

o 

• • 0 

- 


1 0 
0 cos 20 
0 sin 20 


0 

—sin 20 
cos 20 


b. 



cos (— 45 ) 0 sin (— 45 ) 


'0 -1 0" 

m a = 

0 1 0 

, m 5 = 

1 0 0 


—sin (—45) 0 cos ( — 45) 


0 0 1 


Pf = M 5 M A M 3 (M { P+M 2 ) 


a. 

‘.3 

0 

0" 


"l 

0 

0 


'1 

1 • ■ 

■ • f 

M\ = 

0 

.5 

0 

, M 2 = 

0 

cos 45 

—sin 45 

, m 3 = 

0 

0 • ■ 

■ ■ 0 


0 

0 

1 


_0 

sin 45 

o 

cos 45 


0 

0 ■ ■ 

■ ■ 0 


m 4 = 

cos 35 0 sin 35 

0 1 0 

II 

* 

cos ( — 45) —sin ( — 45) 0 

sin ( — 45) cos ( — 45) 0 


—sin 35 0 cos 35 


O 

O 



o 

o 

o 


o 

o 

m 6 = 

0 0 • • • 0 

, m 7 = 

0 1 0 


11 • • • 1 


0 0 1 


b . Pt = M 1 {M 5 M A {M 2 M { P+M 3 ) + M 6 ) 



cos $ 

0 

sin/? 


cos a 

—sin a 

0" 

*1= 

0 

1 

0 

. *2 = 

sin a 

cos a 

0 


—sin/? 

0 

cos $ 


0 

0 

1 


cos 9 

0 

sin0‘ 


cos a 

sin a 

o' 

r 3 = 

0 

1 

0 

, Ra = 

—sin a 

cos a 

0 , 


—sin 6 

0 

cos 9 


0 

0 

1 


cos $ 

0 

—sin/? 





r 5 = 

0 

1 

0 






sin/? 

0 

cos $ 






7. 














































1 0 0 XQ 

0 1 0 70 

0 0 1 ZQ 

0 0 0 1 

b. [l 0 0 -5" 

0 10 9 

0 0 1-3 
0 0 0 1 



Exercise Set 10.11 

1 . * 


t = 


0 

1 

4 

1 

4 

o 4 4 


1 

1 

0 




4 

4 

_ _ 


"0" 

0 

0 

1 

t\ 


1 

4 

1 

*2 

+ 

2 



0 

0 

*3 


0 

4 

1 4 


1 

1 

1 

0 



2 

4 

4 






V 


’ 3 " 


’ 7 " 


"l5~ 


0 


8 


16 


32 


64 


1 


5 


11 


23 


47 


2 

, = 

8 

, *®- 

16 


32 

, *®- 

64 

. t®-t= 

0 


1 


3 


7 


15 

1 


8 


16 


32 


64 


2 


5 


11 


23 


47 




8 


16 


32 


64 



t ®= 


for ti and 1 3 , — 12 . 9 %; f° r £2 an d 5 . 2 % 


2 1 
2 

3. 






[3 

5 2 

5 

4 2 

5 

4 

3l r 




4 4 

4 

4 4 

4 

4 

4j 



r 13 

18 

9 

22 

13 

7 

21 

16 

10] 

[is 

16 

16 

16 

16 

16 

16 

16 

16 J 


Exercise Set 10.12 



a - = (1.40000, 1.20000) 
x® = (1 41000. 1.23000) 
x® = (1.40900. 1.22700) 
x^ = (1.40910, 1.22730) 
xf = (1.40909, 1.22727) 

xf = (1.40909.1.22727) 
Same as part (a) 

c * = (9.55000, 25.65000) 
= (.59500, — 1.21500) 
xp = (1.49050, 1.47150) 
xf = (1.40095, 1.20285) 
xf = (1.40991, 1.22972) 
xf = (1.40901, 1.22703) 


64 

J_ 

64 

J_ 

64 

J_ 

64 





























4. ** = (1.1), ^ = (2.0), *5 = (1.1) 
x 7 4=xg 4-X9 = 13.00 
X 4 + X 5 = 15.00 
xi +X 2 + *3 = 8.00 
82843(x 6 + xg) + .58579^9 = 14.79 
1.41421 (x 3 +xj + x 7 ) = 14.31 
. 82843 (x 2 + x 4 ) + . 58579* 1 = 3.81 
*3 4= *6 + *9 = 18.00 
*2 + *5 + *8 = 12.00 
*1 +X4 + X7 = 6.00 

. 82843 (x 2 + x 6 ) + .58579*3 = 10.51 
1.41421 (xi +X5 + X9) = 16.13 
.82843 (x 4 + x 8 ) + . 58579*7 = 7.04 

X'l =bxg + X9 = 13.00 
X4 + X5 + xg = 15.00 

xi +*2 +X3 = 8.00 

04289(x 3 4 = X5 + x 7 ) 4 = ,75000(xg + xg) 4= .61396*9 = 14.79 
91421(x 3 +X 5 +X7) + .25000(*2 + *4 + *^ + xg) = 14.31 
04289 (x 3 + X 5 + x 7 ) 4= .75000 (*2 4= x 4 ) 4= .61396xi = 3.81 

*3 +*6 + X9 = 18.00 

*2 4=*5 4=xg = 12.00 

+X4 + X7 = 6.00 

04289(xi +x 5 4-x 9 ) 4- ,75000(x 2 +x 6 ) 4- ,61396x 3 = 10.51 
.91421 (xi +X5 + X9) 4= .25000(*2 + X4 + X6 4-xg) = 16.13 
,04289(xi +X5 + X9) 4- .75000(x 4 + xg) 4 = .61396x 7 = 7.04 


Exercise Set 10.13 

1 . 

Ti\ 


‘(Mi ;k 


Si 


i = 1, 2, 3, 4, where the four values of 


Si 


and 


=ln(4) / In 


»)= 


s ps .47; d}j(S) ps ln(4) / ln( 1 / .47) = 1.8 . ... Rotation angles: 0° (upper left); —90° (upper right); ]80° (lower left); 180° (lower right); 

3 . (0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 

(i) s = -i; (ii) all rotation angles are 0°; (in) d}{(S) = In(7) / ln(3) = 1.771 .... This set is a fractal. 

(i) s = i; (ii) all rotation angles are 180°; (hi) = ln(3)/ln(2) = 1.584... This set is a fractal. 

(i) s = -i; (ii) rotation angles: _90° (top); 180° (lower left); ]80° (lower right); (iii) dx(S) = ln(3) / ln(2) = 1.584 . ... This set is a fractal. 

(i) s = (ii) rotation angles: 90° (upper left); 180° (upper right); 180° (lower right) (iii) df{(S) = ln(3) / ln(2) = 1.584 .... This set is a fractal. 


s= .8509...,0= -2. 69°... 


(0.766, 0.996) rounded to three decimal places 
7 . itf(S)=ln(16)/ln(4) = 2 

8 - ln(4)/ln||j = 4.818... 

d}{(S) = ln(8) / ln(2) = 3; the cube is not a fractal. 

0* k = 20; s = -j; df{(£>) = ln(20) / ln(3) = 2.726...; the set is a fractal. 


1 . 888 ... 















First iterate 


Second iterate 

Third iterate 
Fourth iterate 


d H (X)= ln(2) /ln(3) = 0.6309... 

Area of Sq = 1; area of S\ = = 0.888.. 


i of £ 2 = ||j 2 = 0.790...; area of S 3 = ||j 3 = 0.702...; area of S 4 = ||J 4 = 0.624... 


Exercise Set 10.14 

1 . n(250) = 750, n(25) = 50, n(125) = 250,11(30) = 60, ri(10) = 30, n(50) = 150, n(3750) = 7500, n(6) = 12, n(5) = 10 

One 1-cycle: {(0. 0) }; one 3-cyele: j (|. o). (f, f), (o. §)}; two 4-eycles: {(f o). (f f), (|. o). (|, |)} and {(o. (§, £). (o. f), (£. §)}; 

two 12-c y cles: {(o, I), (1 §). (§. |), (£. 1} (|. £), (f f). (o. f). (f. £), (f . 1} (£, f). g. f ). (f. £)} - 

{(f °) (f 6 )’ (i I)’ (I’ 1} (f I)- (f s} (I’ °)’ (I• I)■ (f I)’ g ’ I)’ (I• I)’ (i 6)}• n(6) = 12 

3 * (a) 3, 7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7,.~ 

(5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5),... 


The first five iterates of ( a oi ' °) are ( 1 Q 1 ' 1 Q1 )’ (IQ! ' 1 Q1 )’ (1Q1 ' 1Q1 )’ ( \0\ ’ 101 ) 


, and 


(- 24 - 
Uoi ’ 


55 1 

101 


(b) 


The matrices of Anosov automorphisms are 


3 2 
1 1 


and 


5 7 
2 3 


The transformation affects a rotation of S through 90 in the clockwise direction. 


( 0 . 1 ) 


(1.1) 


IV 


(0.1/2) 


U.l/2) 


phi m 


(O.O) 


In region I: | b = 


(l.O) 


(O.l) (1/2,1) (1.1) 

III' I' 

IV' II' 

(O.O) (1/2,0) (l.O) 


; in region II 


: b] = [_!]: integionm: [“] = [_}]; 


in region IV: 


-1 

-2 


and j^-, -jJ form one 2-cycle, and j-^-, j j and form another 2-cycle. 


Begin with alQlxlOl array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, which will scatter the black pixels 
throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. Apply the mapping again and then superimpose the letter ‘C’ in black pixels 
onto the resulting image. Repeat this procedure with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for 
the letters ‘B’ through ‘E’ scattered in the background. 

Exercise Set 10.15 


1. 


GIYUOKEVBH 

SFANEFZWJH 


*4 

t invei 

*-U 


12 7 

23 15 


Not invertible 


d. Not invertible 

e. Not invertible 

15 12 
21 5 


M 



















WE LOVE MATH 


Deciphering matrix = 


7 15 
6 5 


; enciphering matrix = 


7 5 
2 15 


THE Y SPLIT THE A TOM 
I HAVE COME TO BURY CAESAR 
a. 010110001 


b. 


0 1 1 
1 1 1 
1 0 1 


A is invertible modulo 29 if and only if det(^) * 0 (mod 29). 

Exercise Set 10.16 


2 . 


— 4 (1 


5) 


n +1 


Oo -cq) 


b -i 




M+l 


(ao-co) 


a ”~*\ 

n=\, = i 

C ”^4 


^2«+l = i =t ~ r\sn (2ao-bo-4c Q ) 
3 6(4) 

^2m+i = 3 “ yr( 2a Q -^0 - 4 co) 

c 2m+1 = 0 


* = 0 , 1 , 2 ,... 


a2 " = n + 6 (4)« (2a ° - b ° ~ 4c ») 
*2„ = ^ 

C2 ” = — *6(4)”" (2a ° — 4co) 


» = 1. 2, 


4 1 

Eigenvalues: Aj = 1, A2 = eigenvectors: ei = 

12 generations; .006% 


e 2 — 


1 

= 1 


2 + 3'^T [( - 3_ '^)(l + '^)" +1 + ( _3 + '^) (1_ '^)” +1 ] 

1 

3 


^rr[(l + /5)" + +(l-)/5)" + ] 


4 M + I 


>lM + l 


[(l + /5)" + (l-/?)”] 


I 

3 4: 

1 


„ + l [(1 + ^5) +(1-/5) ] 

^r[(i + /5)" +1 + (i-i/5)" +1 ] 


An+l 


^+^--± f 2 [(- 3 - E)o + \ E )” + +(-3+1/5)0-/?) 


,«+L 


10 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 1 


Exercise Set 10.17 

L a. 3 

Ai = 2- *1 = 


b x® = 


100 

50 


x^ = 


175 

50 


c x<® = Zx® = 


857 

285 


x® = 


[-] 


I x (^ _ [382] (5) _ [5701 

J’ — 125J’ ~ |_ 191J 


, x®~Aix® 


= P 55 1 

|_287 J 


2.375 

1.49611 

Exercise Set 10.18 














































1. 


Yield = 33-^-% of population; xi = 


Yield = 45.8% of population; xi = 


; harvest 57.9% of youngest age class 


xi = 


1.000 


2.090 

.845 


.845 

.824 


.824 

.795 


.795 

.755 


.755 

.699 

, Ixi = 

.699 

.626 

.626 

.532 


.532 

0 


.418 

0 


0 

0 


0 

0 


0 


1.090 I .418 igg 
7.584 


4 hj=(R—\)f (ajb\b 2 ■ 

5 j h _ «1 + a 2 b\ + • ■ • 4- {aj-\b\b 2 • • • 6y_ 2 ) ~ 1 


tf/6162 ■ ■ ■ 6/_1 + 
Exercise Set 10.19 
1. 

3 

2 , 7 2 

3 


bl-1 + • • ■ +a n bib 2 • ■ 
-l6l6 2 ‘ ' ‘£ 7 - 2 ) ~ 1 

+ «/- l 6 i 6 2 - ■ ■ 6./_2 


5-*- 


-7 COS + -7 cos 7 ^ 


— cos — t 

a 2 r 


2tt t , 1 • 4jt f , 1 • 6 tt , , 1 • 8 ?r 
sin —t + — sin -^-t + — sin + — sin -^-t I 


-?(■ 

i r+ 2 

T ST i 1 2 vt - 1 ... 6 ^ . 1 lOirt 

-—■ cos —zr + -7 cos -=- + —- cos _ 

6 2 7 10 2 7 


^ — 4 = i sin £ — cos 2t — cos 4^ 
^2 3?r 15?r 

4 , 


cos 3^ — ■ 


1 


(2»-l)(2»4-l) 
1 


cos nt J 
2ni:t 


* 2 2 2 


Exercise Set 10.20 
1. 


( 2 »> 


2 cos T 


a. Yes; v = yvi 4= -|v 2 4= -|v 3 
No; v = ~v 1 -h jV2 — yv 3 
c - Yes; v = ^v\ 4 = ^v 2 + 0 v 3 
d- Yes; v = jjvi + -^-v 2 + -jjv 3 

m = number of triangles = 1, n = number of vertex points = 7 , k = number of boundary vertex points = 5; Equation (7) is 7 = 2(7) — 2 — 5. 
3 w= Mv 4 -b = M(civi +C2 v 2 + c 3 v 3 ) + (c 1 4^2 4=£3)b 

= c\(Mv\ + b) +c 2 (Mv 2 4 -b) 4 »c 3 (Mv 3 4 -b) =ciwi 4=C2W2 4 -c 3 w 3 


4. 













b. 




*4 

>5 


v 6 


a. 

M = 

~1 

2~ 


b = 

T 




_0 

1_ 



2 


b. 

M = 

"3 

- 

r 

, b = 

J 1 

O' 



_1 

1_ 


L 

1_ 

c. 

M = 

~1 

0~ 


b = | 


2' 



_0 

1_ 




3_ 

d. 


"1 

1 




1 ' 


M = 

2 


, 

b = 


2 



_ 2 

0 




• 1 _ 


Two of the coefficients are zero. 

At least one of the coefficients is zero, 
c. None of the coefficients are zero. 

a - JVI + JV2 + ±v 3 

b ' [ 7 ] 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 













